Chat Message-ID Polling

System Intent

Change the chat flow so POST /memories/chat returns a message_id immediately (GPU startup is still synchronous — gpu_unavailable returned if no GPU), and the app polls GET /chats/messages for that specific message_id until ready=true. This decouples message creation from content delivery.

Mermaid Diagram

graph TD
    ChatScreen["Chat Screen\nmain/app/app/chat.tsx"]:::updated -->|"POST /memories/chat\n{question, chat_id?}"| APIGW["API Gateway"]:::unchanged
    APIGW --> Dispatcher["Dispatcher Lambda\napi/memories/chat/app.py"]:::updated
    Dispatcher -->|"1 find running EC2\n2 start stopped\n3 launch from template\nsync 2-3s"| GPURes["GPU Resolution"]:::updated
    GPURes -->|"no GPU"| GPUFail["gpu_unavailable\nexisting error message"]:::unchanged
    GPUFail --> APIGW
    GPURes -->|"GPU found"| Save["save user message\nsave placeholder assistant\nready=false content=''"]:::created
    Save -->|"InvocationType=Event\nmessage_id passed"| Worker["Async Worker Lambda"]:::updated
    Save -->|"return 1-2s"| APIGW
    APIGW -->|"200 chat_id message_id status=thinking"|ChatScreen
    ChatScreen -->|"poll every 500ms\nPOST /chats/messages\n{chat_id, message_id}"|GetMsgs["ChatMessagesFunction\napi/chats/messages/app.py"]:::updated
    GetMsgs -->|"point-read by message_id"|DDB["DynamoDB Messages\nready bool added"]:::updated
    DDB -->|"ready=false"|GetMsgs
    GetMsgs -->|"still waiting"|ChatScreen
    Worker -->|"ReasoningAgent.answer\nwait_for_gpu_health"| GPU["GPU Worker FastAPI"]:::unchanged
    GPU --> Worker
    Worker -->|"UPDATE SET ready=true content=answer"|DDB
    DDB -->|"ready=true"|GetMsgs
    GetMsgs -->|"stop polling render message"|ChatScreen
    classDef unchanged fill:#d3d3d3,stroke:#888
    classDef updated fill:#ffe58a,stroke:#888
    classDef created fill:#a8e6a3,stroke:#888

Black-Box Input/Output Contracts

Flow: send-message — POST /memories/chat dispatcher

Inputs: question (required), chat_id (optional), verbose (optional bool)

Outputs — GPU found:

{ "chat_id": "<uuid>", "message_id": "<ulid>", "status": "thinking" }

HTTP 200, within ~1-2s.

Outputs — GPU unavailable:

{ "error": "gpu_unavailable" }

HTTP 200 (existing behavior, existing error copy in frontend unchanged).

Side effects: - New chat created in Chats table if no chat_id - User message saved to Messages table (role=user) - Placeholder assistant message saved to Messages table (role=assistant, content="", ready=false) - Async worker Lambda invoked with message_id

Flow: poll-message — POST /chats/messages with message_id filter

Inputs: chat_id (required), message_id (optional), cursor (optional), limit (optional, default 20)

Outputs — placeholder (not ready):

{ "messages": [{ "message_id": "...", "role": "assistant", "content": "", "ready": false, "created_at": "..." }], "next_cursor": null }

Outputs — ready:

{ "messages": [{ "message_id": "...", "role": "assistant", "content": "..answer..", "ready": true, "created_at": "...", "updated_at": "..." }], "next_cursor": null }

Flow: worker-update — async worker saves answer

Inputs (Lambda event): _async_worker=true, question, chat_id, message_id, _user_id

Behavior: 1. Resolve GPU URL, wait for health (no API Gateway pressure) 2. Run ReasoningAgent.answer(question) 3. Call repo.update_message_ready(chat_id, message_id, content=answer) 4. Call repo.touch_chat(user_id, chat_id)

On any failure (GPU unavailable, agent error): update_message_ready(chat_id, message_id, content=fallback_text) — message always ends up ready=true.

Acceptance Criteria

Test 1: dispatcher-returns-message-id

Given: GPU running, POST /memories/chat with question Then: HTTP 200 with {chat_id, message_id, status:"thinking"} within 2s; placeholder in DynamoDB with ready=false; user message in DynamoDB

Test 2: poll-returns-ready-false-then-true

Given: message_id from dispatcher, worker still processing Then: POST /chats/messages {chat_id, message_id} returns ready=false initially; after worker completes, returns ready=true with content

Test 3: gpu-unavailable-returns-error

Given: No GPU instance found, no launch template Then: HTTP 200 with {error:"gpu_unavailable"}; no DynamoDB writes; frontend shows existing offline message (unchanged)

Test 4: worker-failure-marks-ready

Given: Worker Lambda errors mid-processing Then: Placeholder updated to ready=true with fallback content; polling terminates

Test 5: message-id-filter-ownership

Given: Different user requests message from chat they don't own Then: HTTP 403

Implementation Checklist

Files checklist

[ ] main/server/layers/shared/python/shared/chat/repository.py — add save_placeholder_message(), update_message_ready(), get_message_by_id(); update get_messages() to accept message_id filter; save_message() returns str
[ ] main/server/api/memories/chat/app.py — _dispatch_async(): generate message_id, save placeholder after invoke, pass message_id to worker payload, return {chat_id, message_id, status}; implementation(): accept message_id from payload, call update_message_ready() instead of save_message() for assistant
[ ] main/server/api/chats/messages/app.py — accept optional message_id param, pass to repo.get_messages()
[ ] main/app/lib/api/memory/chatWithMemory.ts — update response type to include message_id; keep GpuUnavailableError for gpu_unavailable error
[ ] main/app/lib/api/chats/getChatMessages.ts — add optional message_id param; add ready boolean to ChatMessage type
[ ] main/app/lib/api/chats/pollForMessage.ts — NEW file: poll loop that calls getChatMessages with message_id until ready=true
[ ] main/app/app/chat.tsx — update askQuestion(): call sendChatMessage(), get message_id, add placeholder bubble, call pollForMessage(), update bubble content when ready

Flows checklist

[ ] send-message: dispatcher saves placeholder, invokes worker async, returns message_id
[ ] poll-message: GET /chats/messages with message_id filter returns ready flag
[ ] worker-update: worker updates placeholder to ready=true with answer (or fallback)
[ ] gpu-unavailable: no messages saved, existing error copy shown
[ ] worker-failure: placeholder always updated to ready=true even on error

Notes

GPU error copy on frontend is UNCHANGED — keep "The AI is offline right now. Please try again in a few minutes."
save_message() in repository.py should return str (the message_id) for consistency, but existing callers that ignore the return value are unaffected
No new Lambda — ChatMessagesFunction updated in-place
ready attribute added to DynamoDB items (boolean, no schema migration needed — DynamoDB is schemaless; old messages without ready treated as ready=true by frontend since they have content)
Poll interval: 500ms, max wait: 60s, then show timeout error