GPU Pending Polling Fix
Plan Metadata
- Plan type:
plan - Parent plan:
N/A - Depends on:
docs/plans/gpu-retry-notification.md - Status:
draft
Status semantics: - draft: Plan is being created or updated and is not final. - approved: Plan is approved but not yet applied in code. - documentation: Code currently exists and matches the plan contract.
System Intent
When the GPU is starting up and the user sees "Starting up the GPU, we will notify you when we have an answer", the chat message currently never updates when the answer finally arrives. The user must manually reload to see the result.
Solution: Detect when pollForMessage encounters gpu_pending=true (from the backend's mark_gpu_pending call). Instead of throwing an error (which stops polling), switch to slow polling (every 30s) with a 30-minute timeout. Update the placeholder bubble text once to "Starting up...", then wait for the backend retry Lambda to finish and save the answer with ready=true. The existing placeholder-update flow then replaces the text with the actual answer.
Primary consumers: - Chat screen (main/app/app/chat.tsx) - Message polling logic (main/app/lib/api/chats/pollForMessage.ts)
Boundary (black-box scope): - Input: pollForMessage() options with chat_id and message_id - Output: ChatMessage with ready=true and final content - Intermediate state: gpu_pending messages detected in polling loop
Stage Gate Tracker
- [ ] Stage 1 Mermaid approved
- [ ] Stage 2 I/O contracts approved
- [ ] Stage 3 pseudocode/technical details approved or skipped
1. Mermaid Diagram
graph TD
ChatScreen["Chat Screen\n(chat.tsx)"]:::app
PollFunc["pollForMessage()\n(pollForMessage.ts)"]:::lib
GetMessages["GET /chats/messages"]:::api
Dispatcher["Dispatcher Lambda"]:::backend
Worker["Async Worker Lambda"]:::backend
Retry["ChatGpuRetryFunction\n(backend)"]:::backend
DDB["DynamoDB Messages"]:::data
ChatScreen -->|onGpuPending callback +\n30-second polling| PollFunc
ChatScreen -->|POST /memories/chat\n(question)| Dispatcher
Dispatcher -->|save placeholder\nready=false gpu_pending=false| DDB
Dispatcher -->|invoke worker async\nreturn message_id| ChatScreen
ChatScreen -->|addMessage(placeholder)| ChatScreen
ChatScreen -->|start polling\nwith callback| PollFunc
PollFunc -->|GET /chats/messages\nevery 500ms| GetMessages
GetMessages -->|ready=false gpu_pending=false| DDB
GetMessages -->|returns message| PollFunc
PollFunc -->|gpu_pending=true detected\ncall onGpuPending() once| ChatScreen
ChatScreen -->|update placeholder bubble\nto Starting up text| ChatScreen
PollFunc -->|switch to 30s interval| PollFunc
Worker -->|GPU timeout| Retry
Retry -->|save answer ready=true| DDB
PollFunc -->|GET /chats/messages\nevery 30s| GetMessages
GetMessages -->|ready=true + content| DDB
GetMessages -->|returns message| PollFunc
PollFunc -->|resolve with answer| ChatScreen
ChatScreen -->|update placeholder\nwith final answer| ChatScreen
classDef app fill:#ffe58a,stroke:#888
classDef lib fill:#a8e6a3,stroke:#888
classDef api fill:#d3d3d3,stroke:#888
classDef backend fill:#87ceeb,stroke:#888
classDef data fill:#dda0dd,stroke:#888 2. Black-Box Inputs and Outputs
Global Types
ChatMessage {
message_id: string (unique message identifier)
chat_id: string (chat session identifier)
role: "user" | "assistant"
content: string (message body text)
created_at: timestamp
ready: boolean (true = content ready, false = placeholder pending)
updated_at: timestamp (optional)
gpu_pending: boolean (optional; true = GPU starting up, answer pending on retry)
}
PollForMessageOptions {
chat_id: string (required; chat session ID)
message_id: string (required; message to poll)
intervalMs: number (optional; default 500ms — initial polling interval)
maxWaitMs: number (optional; default 1800000ms = 30min when gpu_pending possible)
onGpuPending: () => void (optional; callback when gpu_pending detected; fires once)
gpuPendingIntervalMs: number (optional; default 30000ms — slow polling interval when gpu_pending)
}
ChatWithMemoryResponse {
chat_id: string | null
message_id: string | null
status: string | undefined (e.g., "thinking")
answer: string | undefined
trace: Trace | undefined
}
Flow: poll-message-with-gpu-pending
- Test files:
main/app/__tests__/chat-screen-polling.test.tsx - Core files:
main/app/lib/api/chats/pollForMessage.ts,main/app/app/chat.tsx
Type Definitions
PollForMessageInput {
chat_id: string (required)
message_id: string (required)
intervalMs: number (initial polling interval, default 500ms)
maxWaitMs: number (total timeout, auto-extended to 30min when gpu_pending likely)
onGpuPending: () => void (optional callback; called once when gpu_pending detected)
gpuPendingIntervalMs: number (slow polling interval when gpu_pending, default 30s)
}
PollForMessageOutput {
message: ChatMessage (with ready=true and final content)
}
PollForMessageError {
name: "PollTimeoutError"
message: "Timed out waiting for message to be ready"
}
Paths
| path-name | input | output/expected state change | path-type | notes | updated |
|---|---|---|---|---|---|
poll-message.ready-immediately | PollForMessageInput with ready=true on first poll | PollForMessageOutput with ready=true content; no callback fired | happy path | GPU available from start; answer returned by dispatcher or worker immediately | Y |
poll-message.gpu-pending-detected | PollForMessageInput with gpu_pending=true returned on 2nd+ poll | onGpuPending callback fired once; polling continues at 30s interval; no error thrown | new path | Detects gpu_pending=true, switches interval, calls callback, continues polling | Y |
poll-message.gpu-pending-resolves | PollForMessageInput with gpu_pending=true then ready=true after 30s+ | PollForMessageOutput with final answer content; callback fired once earlier | new subpath | After gpu_pending detected and callback fired, continues polling at 30s intervals until ready=true | Y |
poll-message.gpu-pending-timeout | PollForMessageInput with gpu_pending=true for >30 minutes | PollTimeoutError thrown; maxWaitMs (30min) exceeded | error | Backend GPU retry Lambda exceeds max attempts; user sees timeout error | Y |
poll-message.timeout-no-gpu-pending | PollForMessageInput with ready=false, gpu_pending=false for >60s | PollTimeoutError thrown; maxWaitMs (60s default) exceeded | error | Standard polling timeout when GPU is not involved | |
poll-message.backward-compatible | PollForMessageInput without onGpuPending or gpuPendingIntervalMs | Polling uses default intervals and ignores gpu_pending attribute | legacy path | Defaults applied; callback optional |
Flow: chat-screen-gpu-pending-ui-update
- Test files:
main/app/__tests__/chat-screen-polling.test.tsx - Core files:
main/app/app/chat.tsx
Type Definitions
ChatScreenGpuPendingInput {
placeholderId: string (message_id from dispatcher response)
placeholderContent: string (current empty or temporary content)
onGpuPending: callback (fired when gpu_pending detected during polling)
}
ChatScreenGpuPendingOutput {
placeholderUpdated: boolean (true when "Starting up..." message set)
finalAnswerUpdated: boolean (true when actual answer replaces placeholder)
userState: "loading-thinking" | "loading-gpu-pending" | "answer-ready"
}
Paths
| path-name | input | output/expected state change | path-type | notes | updated |
|---|---|---|---|---|---|
chat-screen.gpu-pending-callback-fired | onGpuPending() triggered from pollForMessage | Message bubble text updated to "Starting up the GPU, we will notify you when we have an answer"; id stays same (placeholderId) | new path | Callback handler replaces empty bubble with "Starting up..." text; keyed to same placeholderId | Y |
chat-screen.gpu-pending-resolves-to-answer | pollForMessage() resolves with ready=true content after gpu_pending | Same placeholder (placeholderId) updated with final answer; isLoading → false | new subpath | Final answer replaces "Starting up..." text in same bubble | Y |
chat-screen.gpu-unavailable-error-path | chatWithMemory() throws GpuUnavailableError | Shows "Starting up the GPU, we will notify you when we have an answer"; no polling starts (different error path) | existing path | This path (synchronous dispatcher failure) is unchanged; polling does not start | |
chat-screen.gpu-pending-timeout-error | pollForMessage() exceeds 30min timeout while gpu_pending | Error message shown; isLoading → false; no retry UI | error | User sees generic error message; GPU retry Lambda gave up | Y |
3. Pseudocode / Technical Details
Critical Flow: poll-message-with-gpu-pending
pollForMessage(options):
chat_id = options.chat_id
message_id = options.message_id
intervalMs = options.intervalMs ?? 500
gpuPendingIntervalMs = options.gpuPendingIntervalMs ?? 30000
onGpuPending = options.onGpuPending ?? null
maxWaitMs = options.maxWaitMs ?? 60000
// When gpu_pending is possible, extend timeout to 30 minutes
if onGpuPending is not null:
maxWaitMs = max(maxWaitMs, 1800000) // 30 min
deadline = now() + maxWaitMs
gpuPendingCallbackFired = false
currentIntervalMs = intervalMs
poll():
if now() >= deadline:
throw PollTimeoutError()
messages = GET /chats/messages?message_id=message_id
msg = messages[0]
if msg.ready === true:
log("poll-message", "ready", {...})
return msg
// NEW: Detect gpu_pending and switch to slow polling
if msg.gpu_pending === true and not gpuPendingCallbackFired:
log("poll-message", "gpu-pending-detected", {...})
gpuPendingCallbackFired = true
if onGpuPending is not null:
onGpuPending() // Notify chat.tsx to update UI
currentIntervalMs = gpuPendingIntervalMs
log("poll-message", "not-ready", {ready: msg.ready, gpu_pending: msg.gpu_pending})
if now() >= deadline:
throw PollTimeoutError()
sleep(currentIntervalMs)
return poll() // Continue with updated interval
return poll()
Critical Flow: chat-screen-gpu-pending-callback
askQuestion(question):
// ... send question, get message_id ...
if status === "thinking" and message_id and chat_id:
placeholderId = message_id
setMessages([...prev, {id: placeholderId, content: "", role: "assistant"}])
// NEW: Define callback before polling
onGpuPendingCallback = () => {
setMessages((prev) =>
prev.map((m) =>
m.id === placeholderId
? {...m, content: "Starting up the GPU, we will notify you when we have an answer"}
: m
)
)
}
// Poll with callback and extended timeout
readyMsg = await pollForMessage({
chat_id,
message_id,
onGpuPending: onGpuPendingCallback,
gpuPendingIntervalMs: 30000,
maxWaitMs: 1800000 // 30 min
})
// When poll resolves, replace with final answer
setMessages((prev) =>
prev.map((m) =>
m.id === placeholderId
? {...m, content: readyMsg.content}
: m
)
)
Implementation Notes
-
Backward Compatibility:
onGpuPendingis optional. If not provided, polling behaves as before (no gpu_pending detection, no interval switching). -
Callback Firing: The callback fires exactly once per poll session, triggered by the first detection of
gpu_pending=true. A flag (gpuPendingCallbackFired) ensures it doesn't fire repeatedly. -
Interval Switching: Once
gpu_pending=trueis detected, the polling interval switches permanently fromintervalMs(500ms) togpuPendingIntervalMs(30s default). This reduces backend load during long retries. -
Timeout Extension: When
onGpuPendingis provided,maxWaitMsis automatically extended to at least 30 minutes (1800000ms) to match the backend GPU retry Lambda's max attempts (15 retries × 2-min intervals). -
Error Paths: The
GpuUnavailableErrorpath (synchronous dispatcher failure) is unchanged — no polling starts, no callback needed. -
Placeholder Keying: The same
placeholderId(message_id from dispatcher) is used for both the initial empty bubble and the "Starting up..." text, and finally the resolved answer. This ensures a smooth visual transition in the same bubble position. -
Finally Block: The
finallyblock inaskQuestion()should not callsetIsLoading(false)untilpollForMessage()fully resolves. This is already correct in the current code.
4. Acceptance Criteria
The following tests validate the GPU pending polling behavior:
Test 1: gpu-pending-detected-callback-fired
Given: pollForMessage() with onGpuPending callback; backend returns gpu_pending=true on second poll Then: Callback fires exactly once; polling continues; no error thrown; interval switches to 30s
Test 2: gpu-pending-resolves-with-answer
Given: pollForMessage() detects gpu_pending=true, callback fires; after 30s+, backend returns ready=true with answer Then: Poll resolves with final message; chat screen updates placeholder bubble with answer
Test 3: gpu-pending-ui-update-flow
Given: Chat screen calls askQuestion(), dispatcher returns message_id, poll detects gpu_pending=true Then: Placeholder bubble created; onGpuPending callback fires; bubble text updates to "Starting up..."; user can see progress without reloading
Test 4: gpu-pending-timeout-exceeded
Given: Poll with gpu_pending=true for >30 minutes without ready=true Then: PollTimeoutError thrown; error message displayed; isLoading set to false
Test 5: backward-compatibility-no-callback
Given: pollForMessage() called without onGpuPending callback Then: Polling behavior unchanged; gpu_pending attribute ignored; standard 60s timeout applies
Test 6: gpu-unavailable-error-unchanged
Given: chatWithMemory() throws GpuUnavailableError (synchronous dispatcher failure) Then: No polling starts; "Starting up..." message shown in catch block; different from gpu_pending path
Test 7: gpu-pending-callback-fires-only-once
Given: Poll receives multiple messages with gpu_pending=true Then: Callback fired on first detection only; does not fire on subsequent polling rounds
Test 8: interval-switch-verified
Given: Poll starts with 500ms interval; detects gpu_pending=true Then: Subsequent polls use 30s interval; confirmed via logging or explicit timing checks
ONCE YOU GET APPROVAL FROM THE DEVELOPER FOR STAGES 1–3, DELETE THESE LINES AND UPDATE THE STAGE GATE TRACKER BELOW
Stage Gate Tracker (Updated)
- [ ] Stage 1 Mermaid approved
- [ ] Stage 2 I/O contracts approved
- [ ] Stage 3 pseudocode/technical details approved or skipped