Async Chat — Bypass API Gateway 29s Timeout
System Intent
POST /memories/chat currently blocks for ~56 seconds while the ReasoningAgent runs. API Gateway has a hard 29s limit, so the connection is cut and the client gets a 504 — even though the Lambda finishes and saves the answer to the DB.
Solution: Split the Lambda into a fast dispatcher path (~1s) and an async worker path (~56s). The dispatcher saves the user message, invokes the same Lambda asynchronously (InvocationType=Event) with _async_worker=true, and returns {chat_id, status: "thinking"} immediately. The app polls GET /chats/messages every 2-3s until the assistant message appears.
Mermaid Diagram
graph TB
User["User App"]
APIGateway["API Gateway (29s timeout)"]
DispatcherLambda["MemoriesChatFunction Dispatcher (~1s)"]
AsyncWorkerLambda["MemoriesChatFunction Worker (~56s) InvocationType=Event"]
DynamoDB["DynamoDB Messages + Chats"]
User -->|POST /memories/chat| APIGateway
APIGateway --> DispatcherLambda
DispatcherLambda -->|1. save user message| DynamoDB
DispatcherLambda -->|2. invoke self async| AsyncWorkerLambda
DispatcherLambda -->|3. return chat_id + status=thinking| APIGateway
APIGateway -->|200 OK within 1s| User
AsyncWorkerLambda -->|run ReasoningAgent 56s| AsyncWorkerLambda
AsyncWorkerLambda -->|save assistant answer| DynamoDB
User -->|poll GET /chats/messages every 2-3s| APIGateway
APIGateway -->|messages list| User Implementation Flows
Flow 1: Dispatcher path in chat/app.py
In main/server/api/memories/chat/app.py, add a dispatcher entry point.
At the top of lambda_handler, check for _async_worker in the event. If true, fall through to the existing implementation() call (worker path). If false, run the new dispatcher logic:
- Parse question and chat_id from payload
- Validate question is non-empty
- Create or validate chat ownership via repository
- Save user message via
repo.save_message(chat_id, "user", question) - Invoke self asynchronously via boto3 Lambda client with
InvocationType="Event"and_async_worker=Trueadded to payload - Return
{"chat_id": chat_id, "status": "thinking"}with HTTP 200
The existing implementation() function is the worker path — no changes to it.
Files modified: - main/server/api/memories/chat/app.py
Flow 2: IAM permission for self-invocation in template.yaml
The Lambda execution role needs lambda:InvokeFunction on its own ARN.
In main/server/template.yaml, find the policy that grants Lambda invoke permissions and add !GetAtt MemoriesChatFunction.Arn to the resource list.
Files modified: - main/server/template.yaml
Flow 3: Tests for dispatcher and worker paths
Add unit tests covering: - Dispatcher returns {chat_id, status: "thinking"} within the function (mock async invoke) - Dispatcher saves user message before returning - Worker path (_async_worker=True) calls implementation() directly - Dispatcher returns 400 for missing question - Dispatcher returns 403 for chat owned by another user
Files created: - main/server/tests/unit/test_async_chat_dispatcher.py
Acceptance Criteria
- [ ] POST /memories/chat returns
{chat_id, status: "thinking"}in < 2 seconds with a warm Lambda - [ ] Async worker Lambda runs and saves assistant message to DB within 70 seconds
- [ ] GET /chats/messages returns the assistant message after polling
- [ ] Existing
implementation()function is unchanged - [ ] Template.yaml grants self-invocation permission
- [ ] Unit tests pass for dispatcher and worker routing