GPU Pending Polling Fix

Plan Metadata

Plan type: plan
Parent plan: N/A
Depends on: docs/plans/gpu-retry-notification.md
Status: draft

Status semantics: - draft: Plan is being created or updated and is not final. - approved: Plan is approved but not yet applied in code. - documentation: Code currently exists and matches the plan contract.

System Intent

When the GPU is starting up and the user sees "Starting up the GPU, we will notify you when we have an answer", the chat message currently never updates when the answer finally arrives. The user must manually reload to see the result.

Solution: Detect when pollForMessage encounters gpu_pending=true (from the backend's mark_gpu_pending call). Instead of throwing an error (which stops polling), switch to slow polling (every 30s) with a 30-minute timeout. Update the placeholder bubble text once to "Starting up...", then wait for the backend retry Lambda to finish and save the answer with ready=true. The existing placeholder-update flow then replaces the text with the actual answer.

Primary consumers: - Chat screen (main/app/app/chat.tsx) - Message polling logic (main/app/lib/api/chats/pollForMessage.ts)

Boundary (black-box scope): - Input: pollForMessage() options with chat_id and message_id - Output: ChatMessage with ready=true and final content - Intermediate state: gpu_pending messages detected in polling loop

Stage Gate Tracker

[ ] Stage 1 Mermaid approved
[ ] Stage 2 I/O contracts approved
[ ] Stage 3 pseudocode/technical details approved or skipped

1. Mermaid Diagram

graph TD
    ChatScreen["Chat Screen\n(chat.tsx)"]:::app
    PollFunc["pollForMessage()\n(pollForMessage.ts)"]:::lib
    GetMessages["GET /chats/messages"]:::api
    Dispatcher["Dispatcher Lambda"]:::backend
    Worker["Async Worker Lambda"]:::backend
    Retry["ChatGpuRetryFunction\n(backend)"]:::backend
    DDB["DynamoDB Messages"]:::data

    ChatScreen -->|onGpuPending callback +\n30-second polling| PollFunc
    ChatScreen -->|POST /memories/chat\n(question)| Dispatcher
    Dispatcher -->|save placeholder\nready=false gpu_pending=false| DDB
    Dispatcher -->|invoke worker async\nreturn message_id| ChatScreen

    ChatScreen -->|addMessage(placeholder)| ChatScreen
    ChatScreen -->|start polling\nwith callback| PollFunc

    PollFunc -->|GET /chats/messages\nevery 500ms| GetMessages
    GetMessages -->|ready=false gpu_pending=false| DDB
    GetMessages -->|returns message| PollFunc

    PollFunc -->|gpu_pending=true detected\ncall onGpuPending() once| ChatScreen
    ChatScreen -->|update placeholder bubble\nto Starting up text| ChatScreen

    PollFunc -->|switch to 30s interval| PollFunc
    Worker -->|GPU timeout| Retry
    Retry -->|save answer ready=true| DDB

    PollFunc -->|GET /chats/messages\nevery 30s| GetMessages
    GetMessages -->|ready=true + content| DDB
    GetMessages -->|returns message| PollFunc
    PollFunc -->|resolve with answer| ChatScreen

    ChatScreen -->|update placeholder\nwith final answer| ChatScreen

    classDef app fill:#ffe58a,stroke:#888
    classDef lib fill:#a8e6a3,stroke:#888
    classDef api fill:#d3d3d3,stroke:#888
    classDef backend fill:#87ceeb,stroke:#888
    classDef data fill:#dda0dd,stroke:#888

2. Black-Box Inputs and Outputs

Global Types

ChatMessage {
  message_id: string (unique message identifier)
  chat_id: string (chat session identifier)
  role: "user" | "assistant"
  content: string (message body text)
  created_at: timestamp
  ready: boolean (true = content ready, false = placeholder pending)
  updated_at: timestamp (optional)
  gpu_pending: boolean (optional; true = GPU starting up, answer pending on retry)
}

PollForMessageOptions {
  chat_id: string (required; chat session ID)
  message_id: string (required; message to poll)
  intervalMs: number (optional; default 500ms — initial polling interval)
  maxWaitMs: number (optional; default 1800000ms = 30min when gpu_pending possible)
  onGpuPending: () => void (optional; callback when gpu_pending detected; fires once)
  gpuPendingIntervalMs: number (optional; default 30000ms — slow polling interval when gpu_pending)
}

ChatWithMemoryResponse {
  chat_id: string | null
  message_id: string | null
  status: string | undefined (e.g., "thinking")
  answer: string | undefined
  trace: Trace | undefined
}

Flow: `poll-message-with-gpu-pending`

Test files: main/app/__tests__/chat-screen-polling.test.tsx
Core files: main/app/lib/api/chats/pollForMessage.ts, main/app/app/chat.tsx

Type Definitions

PollForMessageInput {
  chat_id: string (required)
  message_id: string (required)
  intervalMs: number (initial polling interval, default 500ms)
  maxWaitMs: number (total timeout, auto-extended to 30min when gpu_pending likely)
  onGpuPending: () => void (optional callback; called once when gpu_pending detected)
  gpuPendingIntervalMs: number (slow polling interval when gpu_pending, default 30s)
}

PollForMessageOutput {
  message: ChatMessage (with ready=true and final content)
}

PollForMessageError {
  name: "PollTimeoutError"
  message: "Timed out waiting for message to be ready"
}

Paths

path-name	input	output/expected state change	path-type	notes	updated
`poll-message.ready-immediately`	`PollForMessageInput` with `ready=true` on first poll	`PollForMessageOutput with ready=true content; no callback fired`	`happy path`	GPU available from start; answer returned by dispatcher or worker immediately	`Y`
`poll-message.gpu-pending-detected`	`PollForMessageInput` with `gpu_pending=true` returned on 2nd+ poll	`onGpuPending callback fired once; polling continues at 30s interval; no error thrown`	`new path`	Detects `gpu_pending=true`, switches interval, calls callback, continues polling	`Y`
`poll-message.gpu-pending-resolves`	`PollForMessageInput` with `gpu_pending=true` then `ready=true` after 30s+	`PollForMessageOutput with final answer content; callback fired once earlier`	`new subpath`	After gpu_pending detected and callback fired, continues polling at 30s intervals until `ready=true`	`Y`
`poll-message.gpu-pending-timeout`	`PollForMessageInput` with `gpu_pending=true` for >30 minutes	`PollTimeoutError thrown; maxWaitMs (30min) exceeded`	`error`	Backend GPU retry Lambda exceeds max attempts; user sees timeout error	`Y`
`poll-message.timeout-no-gpu-pending`	`PollForMessageInput` with `ready=false, gpu_pending=false` for >60s	`PollTimeoutError thrown; maxWaitMs (60s default) exceeded`	`error`	Standard polling timeout when GPU is not involved
`poll-message.backward-compatible`	`PollForMessageInput` without `onGpuPending` or `gpuPendingIntervalMs`	`Polling uses default intervals and ignores gpu_pending attribute`	`legacy path`	Defaults applied; callback optional

Flow: `chat-screen-gpu-pending-ui-update`

Test files: main/app/__tests__/chat-screen-polling.test.tsx
Core files: main/app/app/chat.tsx

Type Definitions

ChatScreenGpuPendingInput {
  placeholderId: string (message_id from dispatcher response)
  placeholderContent: string (current empty or temporary content)
  onGpuPending: callback (fired when gpu_pending detected during polling)
}

ChatScreenGpuPendingOutput {
  placeholderUpdated: boolean (true when "Starting up..." message set)
  finalAnswerUpdated: boolean (true when actual answer replaces placeholder)
  userState: "loading-thinking" | "loading-gpu-pending" | "answer-ready"
}

Paths

path-name	input	output/expected state change	path-type	notes	updated
`chat-screen.gpu-pending-callback-fired`	`onGpuPending()` triggered from pollForMessage	Message bubble text updated to "Starting up the GPU, we will notify you when we have an answer"; id stays same (placeholderId)	`new path`	Callback handler replaces empty bubble with "Starting up..." text; keyed to same placeholderId	`Y`
`chat-screen.gpu-pending-resolves-to-answer`	`pollForMessage()` resolves with `ready=true content` after gpu_pending	Same placeholder (placeholderId) updated with final answer; isLoading → false	`new subpath`	Final answer replaces "Starting up..." text in same bubble	`Y`
`chat-screen.gpu-unavailable-error-path`	`chatWithMemory()` throws `GpuUnavailableError`	Shows "Starting up the GPU, we will notify you when we have an answer"; no polling starts (different error path)	`existing path`	This path (synchronous dispatcher failure) is unchanged; polling does not start
`chat-screen.gpu-pending-timeout-error`	`pollForMessage()` exceeds 30min timeout while gpu_pending	Error message shown; isLoading → false; no retry UI	`error`	User sees generic error message; GPU retry Lambda gave up	`Y`

3. Pseudocode / Technical Details

Critical Flow: poll-message-with-gpu-pending

pollForMessage(options):
  chat_id = options.chat_id
  message_id = options.message_id
  intervalMs = options.intervalMs ?? 500
  gpuPendingIntervalMs = options.gpuPendingIntervalMs ?? 30000
  onGpuPending = options.onGpuPending ?? null
  maxWaitMs = options.maxWaitMs ?? 60000

  // When gpu_pending is possible, extend timeout to 30 minutes
  if onGpuPending is not null:
    maxWaitMs = max(maxWaitMs, 1800000)  // 30 min

  deadline = now() + maxWaitMs
  gpuPendingCallbackFired = false
  currentIntervalMs = intervalMs

  poll():
    if now() >= deadline:
      throw PollTimeoutError()

    messages = GET /chats/messages?message_id=message_id
    msg = messages[0]

    if msg.ready === true:
      log("poll-message", "ready", {...})
      return msg

    // NEW: Detect gpu_pending and switch to slow polling
    if msg.gpu_pending === true and not gpuPendingCallbackFired:
      log("poll-message", "gpu-pending-detected", {...})
      gpuPendingCallbackFired = true
      if onGpuPending is not null:
        onGpuPending()  // Notify chat.tsx to update UI
      currentIntervalMs = gpuPendingIntervalMs

    log("poll-message", "not-ready", {ready: msg.ready, gpu_pending: msg.gpu_pending})

    if now() >= deadline:
      throw PollTimeoutError()

    sleep(currentIntervalMs)
    return poll()  // Continue with updated interval

  return poll()

Critical Flow: chat-screen-gpu-pending-callback

askQuestion(question):
  // ... send question, get message_id ...

  if status === "thinking" and message_id and chat_id:
    placeholderId = message_id
    setMessages([...prev, {id: placeholderId, content: "", role: "assistant"}])

    // NEW: Define callback before polling
    onGpuPendingCallback = () => {
      setMessages((prev) =>
        prev.map((m) =>
          m.id === placeholderId
            ? {...m, content: "Starting up the GPU, we will notify you when we have an answer"}
            : m
        )
      )
    }

    // Poll with callback and extended timeout
    readyMsg = await pollForMessage({
      chat_id,
      message_id,
      onGpuPending: onGpuPendingCallback,
      gpuPendingIntervalMs: 30000,
      maxWaitMs: 1800000  // 30 min
    })

    // When poll resolves, replace with final answer
    setMessages((prev) =>
      prev.map((m) =>
        m.id === placeholderId
          ? {...m, content: readyMsg.content}
          : m
      )
    )

Implementation Notes

Backward Compatibility: onGpuPending is optional. If not provided, polling behaves as before (no gpu_pending detection, no interval switching).
Callback Firing: The callback fires exactly once per poll session, triggered by the first detection of gpu_pending=true. A flag (gpuPendingCallbackFired) ensures it doesn't fire repeatedly.
Interval Switching: Once gpu_pending=true is detected, the polling interval switches permanently from intervalMs (500ms) to gpuPendingIntervalMs (30s default). This reduces backend load during long retries.
Timeout Extension: When onGpuPending is provided, maxWaitMs is automatically extended to at least 30 minutes (1800000ms) to match the backend GPU retry Lambda's max attempts (15 retries × 2-min intervals).
Error Paths: The GpuUnavailableError path (synchronous dispatcher failure) is unchanged — no polling starts, no callback needed.
Placeholder Keying: The same placeholderId (message_id from dispatcher) is used for both the initial empty bubble and the "Starting up..." text, and finally the resolved answer. This ensures a smooth visual transition in the same bubble position.
Finally Block: The finally block in askQuestion() should not call setIsLoading(false) until pollForMessage() fully resolves. This is already correct in the current code.

4. Acceptance Criteria

The following tests validate the GPU pending polling behavior:

Test 1: gpu-pending-detected-callback-fired

Given: pollForMessage() with onGpuPending callback; backend returns gpu_pending=true on second poll Then: Callback fires exactly once; polling continues; no error thrown; interval switches to 30s

Test 2: gpu-pending-resolves-with-answer

Given: pollForMessage() detects gpu_pending=true, callback fires; after 30s+, backend returns ready=true with answer Then: Poll resolves with final message; chat screen updates placeholder bubble with answer

Test 3: gpu-pending-ui-update-flow

Given: Chat screen calls askQuestion(), dispatcher returns message_id, poll detects gpu_pending=true Then: Placeholder bubble created; onGpuPending callback fires; bubble text updates to "Starting up..."; user can see progress without reloading

Test 4: gpu-pending-timeout-exceeded

Given: Poll with gpu_pending=true for >30 minutes without ready=true Then: PollTimeoutError thrown; error message displayed; isLoading set to false

Test 5: backward-compatibility-no-callback

Given: pollForMessage() called without onGpuPending callback Then: Polling behavior unchanged; gpu_pending attribute ignored; standard 60s timeout applies

Test 6: gpu-unavailable-error-unchanged

Given: chatWithMemory() throws GpuUnavailableError (synchronous dispatcher failure) Then: No polling starts; "Starting up..." message shown in catch block; different from gpu_pending path

Test 7: gpu-pending-callback-fires-only-once

Given: Poll receives multiple messages with gpu_pending=true Then: Callback fired on first detection only; does not fire on subsequent polling rounds

Test 8: interval-switch-verified

Given: Poll starts with 500ms interval; detects gpu_pending=true Then: Subsequent polls use 30s interval; confirmed via logging or explicit timing checks

ONCE YOU GET APPROVAL FROM THE DEVELOPER FOR STAGES 1–3, DELETE THESE LINES AND UPDATE THE STAGE GATE TRACKER BELOW

Stage Gate Tracker (Updated)

[ ] Stage 1 Mermaid approved
[ ] Stage 2 I/O contracts approved
[ ] Stage 3 pseudocode/technical details approved or skipped

GPU Pending Polling Fix

Plan Metadata

System Intent

Stage Gate Tracker

1. Mermaid Diagram

2. Black-Box Inputs and Outputs

Global Types

Flow: poll-message-with-gpu-pending

Type Definitions

Paths

Flow: chat-screen-gpu-pending-ui-update

Type Definitions

Paths

3. Pseudocode / Technical Details

Critical Flow: poll-message-with-gpu-pending

Critical Flow: chat-screen-gpu-pending-callback

Implementation Notes

4. Acceptance Criteria

Test 1: gpu-pending-detected-callback-fired

Test 2: gpu-pending-resolves-with-answer

Test 3: gpu-pending-ui-update-flow

Test 4: gpu-pending-timeout-exceeded

Test 5: backward-compatibility-no-callback

Test 6: gpu-unavailable-error-unchanged

Test 7: gpu-pending-callback-fires-only-once

Test 8: interval-switch-verified

Stage Gate Tracker (Updated)

Flow: `poll-message-with-gpu-pending`

Flow: `chat-screen-gpu-pending-ui-update`