Skip to content

Async Chat — Bypass API Gateway 29s Timeout

System Intent

POST /memories/chat currently blocks for ~56 seconds while the ReasoningAgent runs. API Gateway has a hard 29s limit, so the connection is cut and the client gets a 504 — even though the Lambda finishes and saves the answer to the DB.

Solution: Split the Lambda into a fast dispatcher path (~1s) and an async worker path (~56s). The dispatcher saves the user message, invokes the same Lambda asynchronously (InvocationType=Event) with _async_worker=true, and returns {chat_id, status: "thinking"} immediately. The app polls GET /chats/messages every 2-3s until the assistant message appears.

Mermaid Diagram

graph TB
    User["User App"]
    APIGateway["API Gateway (29s timeout)"]
    DispatcherLambda["MemoriesChatFunction Dispatcher (~1s)"]
    AsyncWorkerLambda["MemoriesChatFunction Worker (~56s) InvocationType=Event"]
    DynamoDB["DynamoDB Messages + Chats"]

    User -->|POST /memories/chat| APIGateway
    APIGateway --> DispatcherLambda
    DispatcherLambda -->|1. save user message| DynamoDB
    DispatcherLambda -->|2. invoke self async| AsyncWorkerLambda
    DispatcherLambda -->|3. return chat_id + status=thinking| APIGateway
    APIGateway -->|200 OK within 1s| User

    AsyncWorkerLambda -->|run ReasoningAgent 56s| AsyncWorkerLambda
    AsyncWorkerLambda -->|save assistant answer| DynamoDB

    User -->|poll GET /chats/messages every 2-3s| APIGateway
    APIGateway -->|messages list| User

Implementation Flows

Flow 1: Dispatcher path in chat/app.py

In main/server/api/memories/chat/app.py, add a dispatcher entry point.

At the top of lambda_handler, check for _async_worker in the event. If true, fall through to the existing implementation() call (worker path). If false, run the new dispatcher logic:

  1. Parse question and chat_id from payload
  2. Validate question is non-empty
  3. Create or validate chat ownership via repository
  4. Save user message via repo.save_message(chat_id, "user", question)
  5. Invoke self asynchronously via boto3 Lambda client with InvocationType="Event" and _async_worker=True added to payload
  6. Return {"chat_id": chat_id, "status": "thinking"} with HTTP 200

The existing implementation() function is the worker path — no changes to it.

Files modified: - main/server/api/memories/chat/app.py

Flow 2: IAM permission for self-invocation in template.yaml

The Lambda execution role needs lambda:InvokeFunction on its own ARN.

In main/server/template.yaml, find the policy that grants Lambda invoke permissions and add !GetAtt MemoriesChatFunction.Arn to the resource list.

Files modified: - main/server/template.yaml

Flow 3: Tests for dispatcher and worker paths

Add unit tests covering: - Dispatcher returns {chat_id, status: "thinking"} within the function (mock async invoke) - Dispatcher saves user message before returning - Worker path (_async_worker=True) calls implementation() directly - Dispatcher returns 400 for missing question - Dispatcher returns 403 for chat owned by another user

Files created: - main/server/tests/unit/test_async_chat_dispatcher.py

Acceptance Criteria

  • [ ] POST /memories/chat returns {chat_id, status: "thinking"} in < 2 seconds with a warm Lambda
  • [ ] Async worker Lambda runs and saves assistant message to DB within 70 seconds
  • [ ] GET /chats/messages returns the assistant message after polling
  • [ ] Existing implementation() function is unchanged
  • [ ] Template.yaml grants self-invocation permission
  • [ ] Unit tests pass for dispatcher and worker routing