Chat Message-ID Polling
System Intent
Change the chat flow so POST /memories/chat returns a message_id immediately (GPU startup is still synchronous — gpu_unavailable returned if no GPU), and the app polls GET /chats/messages for that specific message_id until ready=true. This decouples message creation from content delivery.
Mermaid Diagram
graph TD
ChatScreen["Chat Screen\nmain/app/app/chat.tsx"]:::updated -->|"POST /memories/chat\n{question, chat_id?}"| APIGW["API Gateway"]:::unchanged
APIGW --> Dispatcher["Dispatcher Lambda\napi/memories/chat/app.py"]:::updated
Dispatcher -->|"1 find running EC2\n2 start stopped\n3 launch from template\nsync 2-3s"| GPURes["GPU Resolution"]:::updated
GPURes -->|"no GPU"| GPUFail["gpu_unavailable\nexisting error message"]:::unchanged
GPUFail --> APIGW
GPURes -->|"GPU found"| Save["save user message\nsave placeholder assistant\nready=false content=''"]:::created
Save -->|"InvocationType=Event\nmessage_id passed"| Worker["Async Worker Lambda"]:::updated
Save -->|"return 1-2s"| APIGW
APIGW -->|"200 chat_id message_id status=thinking"|ChatScreen
ChatScreen -->|"poll every 500ms\nPOST /chats/messages\n{chat_id, message_id}"|GetMsgs["ChatMessagesFunction\napi/chats/messages/app.py"]:::updated
GetMsgs -->|"point-read by message_id"|DDB["DynamoDB Messages\nready bool added"]:::updated
DDB -->|"ready=false"|GetMsgs
GetMsgs -->|"still waiting"|ChatScreen
Worker -->|"ReasoningAgent.answer\nwait_for_gpu_health"| GPU["GPU Worker FastAPI"]:::unchanged
GPU --> Worker
Worker -->|"UPDATE SET ready=true content=answer"|DDB
DDB -->|"ready=true"|GetMsgs
GetMsgs -->|"stop polling render message"|ChatScreen
classDef unchanged fill:#d3d3d3,stroke:#888
classDef updated fill:#ffe58a,stroke:#888
classDef created fill:#a8e6a3,stroke:#888 Black-Box Input/Output Contracts
Flow: send-message — POST /memories/chat dispatcher
Inputs: question (required), chat_id (optional), verbose (optional bool)
Outputs — GPU found:
HTTP 200, within ~1-2s.Outputs — GPU unavailable:
HTTP 200 (existing behavior, existing error copy in frontend unchanged).Side effects: - New chat created in Chats table if no chat_id - User message saved to Messages table (role=user) - Placeholder assistant message saved to Messages table (role=assistant, content="", ready=false) - Async worker Lambda invoked with message_id
Flow: poll-message — POST /chats/messages with message_id filter
Inputs: chat_id (required), message_id (optional), cursor (optional), limit (optional, default 20)
Outputs — placeholder (not ready):
{ "messages": [{ "message_id": "...", "role": "assistant", "content": "", "ready": false, "created_at": "..." }], "next_cursor": null }
Outputs — ready:
{ "messages": [{ "message_id": "...", "role": "assistant", "content": "..answer..", "ready": true, "created_at": "...", "updated_at": "..." }], "next_cursor": null }
Flow: worker-update — async worker saves answer
Inputs (Lambda event): _async_worker=true, question, chat_id, message_id, _user_id
Behavior: 1. Resolve GPU URL, wait for health (no API Gateway pressure) 2. Run ReasoningAgent.answer(question) 3. Call repo.update_message_ready(chat_id, message_id, content=answer) 4. Call repo.touch_chat(user_id, chat_id)
On any failure (GPU unavailable, agent error): update_message_ready(chat_id, message_id, content=fallback_text) — message always ends up ready=true.
Acceptance Criteria
Test 1: dispatcher-returns-message-id
Given: GPU running, POST /memories/chat with question Then: HTTP 200 with {chat_id, message_id, status:"thinking"} within 2s; placeholder in DynamoDB with ready=false; user message in DynamoDB
Test 2: poll-returns-ready-false-then-true
Given: message_id from dispatcher, worker still processing Then: POST /chats/messages {chat_id, message_id} returns ready=false initially; after worker completes, returns ready=true with content
Test 3: gpu-unavailable-returns-error
Given: No GPU instance found, no launch template Then: HTTP 200 with {error:"gpu_unavailable"}; no DynamoDB writes; frontend shows existing offline message (unchanged)
Test 4: worker-failure-marks-ready
Given: Worker Lambda errors mid-processing Then: Placeholder updated to ready=true with fallback content; polling terminates
Test 5: message-id-filter-ownership
Given: Different user requests message from chat they don't own Then: HTTP 403
Implementation Checklist
Files checklist
- [ ]
main/server/layers/shared/python/shared/chat/repository.py— addsave_placeholder_message(),update_message_ready(),get_message_by_id(); updateget_messages()to acceptmessage_idfilter;save_message()returnsstr - [ ]
main/server/api/memories/chat/app.py—_dispatch_async(): generate message_id, save placeholder after invoke, pass message_id to worker payload, return {chat_id, message_id, status};implementation(): accept message_id from payload, callupdate_message_ready()instead ofsave_message()for assistant - [ ]
main/server/api/chats/messages/app.py— accept optionalmessage_idparam, pass torepo.get_messages() - [ ]
main/app/lib/api/memory/chatWithMemory.ts— update response type to includemessage_id; keepGpuUnavailableErrorfor gpu_unavailable error - [ ]
main/app/lib/api/chats/getChatMessages.ts— add optionalmessage_idparam; addreadyboolean toChatMessagetype - [ ]
main/app/lib/api/chats/pollForMessage.ts— NEW file: poll loop that calls getChatMessages with message_id until ready=true - [ ]
main/app/app/chat.tsx— updateaskQuestion(): call sendChatMessage(), get message_id, add placeholder bubble, call pollForMessage(), update bubble content when ready
Flows checklist
- [ ] send-message: dispatcher saves placeholder, invokes worker async, returns message_id
- [ ] poll-message: GET /chats/messages with message_id filter returns ready flag
- [ ] worker-update: worker updates placeholder to ready=true with answer (or fallback)
- [ ] gpu-unavailable: no messages saved, existing error copy shown
- [ ] worker-failure: placeholder always updated to ready=true even on error
Notes
- GPU error copy on frontend is UNCHANGED — keep "The AI is offline right now. Please try again in a few minutes."
save_message()in repository.py should returnstr(the message_id) for consistency, but existing callers that ignore the return value are unaffected- No new Lambda — ChatMessagesFunction updated in-place
readyattribute added to DynamoDB items (boolean, no schema migration needed — DynamoDB is schemaless; old messages withoutreadytreated as ready=true by frontend since they have content)- Poll interval: 500ms, max wait: 60s, then show timeout error