Skip to content

GPU Pending Polling Fix

Plan Metadata

  • Plan type: plan
  • Parent plan: N/A
  • Depends on: docs/plans/gpu-retry-notification.md
  • Status: draft

Status semantics: - draft: Plan is being created or updated and is not final. - approved: Plan is approved but not yet applied in code. - documentation: Code currently exists and matches the plan contract.

System Intent

When the GPU is starting up and the user sees "Starting up the GPU, we will notify you when we have an answer", the chat message currently never updates when the answer finally arrives. The user must manually reload to see the result.

Solution: Detect when pollForMessage encounters gpu_pending=true (from the backend's mark_gpu_pending call). Instead of throwing an error (which stops polling), switch to slow polling (every 30s) with a 30-minute timeout. Update the placeholder bubble text once to "Starting up...", then wait for the backend retry Lambda to finish and save the answer with ready=true. The existing placeholder-update flow then replaces the text with the actual answer.

Primary consumers: - Chat screen (main/app/app/chat.tsx) - Message polling logic (main/app/lib/api/chats/pollForMessage.ts)

Boundary (black-box scope): - Input: pollForMessage() options with chat_id and message_id - Output: ChatMessage with ready=true and final content - Intermediate state: gpu_pending messages detected in polling loop

Stage Gate Tracker

  • [ ] Stage 1 Mermaid approved
  • [ ] Stage 2 I/O contracts approved
  • [ ] Stage 3 pseudocode/technical details approved or skipped

1. Mermaid Diagram

graph TD
    ChatScreen["Chat Screen\n(chat.tsx)"]:::app
    PollFunc["pollForMessage()\n(pollForMessage.ts)"]:::lib
    GetMessages["GET /chats/messages"]:::api
    Dispatcher["Dispatcher Lambda"]:::backend
    Worker["Async Worker Lambda"]:::backend
    Retry["ChatGpuRetryFunction\n(backend)"]:::backend
    DDB["DynamoDB Messages"]:::data

    ChatScreen -->|onGpuPending callback +\n30-second polling| PollFunc
    ChatScreen -->|POST /memories/chat\n(question)| Dispatcher
    Dispatcher -->|save placeholder\nready=false gpu_pending=false| DDB
    Dispatcher -->|invoke worker async\nreturn message_id| ChatScreen

    ChatScreen -->|addMessage(placeholder)| ChatScreen
    ChatScreen -->|start polling\nwith callback| PollFunc

    PollFunc -->|GET /chats/messages\nevery 500ms| GetMessages
    GetMessages -->|ready=false gpu_pending=false| DDB
    GetMessages -->|returns message| PollFunc

    PollFunc -->|gpu_pending=true detected\ncall onGpuPending() once| ChatScreen
    ChatScreen -->|update placeholder bubble\nto Starting up text| ChatScreen

    PollFunc -->|switch to 30s interval| PollFunc
    Worker -->|GPU timeout| Retry
    Retry -->|save answer ready=true| DDB

    PollFunc -->|GET /chats/messages\nevery 30s| GetMessages
    GetMessages -->|ready=true + content| DDB
    GetMessages -->|returns message| PollFunc
    PollFunc -->|resolve with answer| ChatScreen

    ChatScreen -->|update placeholder\nwith final answer| ChatScreen

    classDef app fill:#ffe58a,stroke:#888
    classDef lib fill:#a8e6a3,stroke:#888
    classDef api fill:#d3d3d3,stroke:#888
    classDef backend fill:#87ceeb,stroke:#888
    classDef data fill:#dda0dd,stroke:#888

2. Black-Box Inputs and Outputs

Global Types

ChatMessage {
  message_id: string (unique message identifier)
  chat_id: string (chat session identifier)
  role: "user" | "assistant"
  content: string (message body text)
  created_at: timestamp
  ready: boolean (true = content ready, false = placeholder pending)
  updated_at: timestamp (optional)
  gpu_pending: boolean (optional; true = GPU starting up, answer pending on retry)
}

PollForMessageOptions {
  chat_id: string (required; chat session ID)
  message_id: string (required; message to poll)
  intervalMs: number (optional; default 500ms — initial polling interval)
  maxWaitMs: number (optional; default 1800000ms = 30min when gpu_pending possible)
  onGpuPending: () => void (optional; callback when gpu_pending detected; fires once)
  gpuPendingIntervalMs: number (optional; default 30000ms — slow polling interval when gpu_pending)
}

ChatWithMemoryResponse {
  chat_id: string | null
  message_id: string | null
  status: string | undefined (e.g., "thinking")
  answer: string | undefined
  trace: Trace | undefined
}

Flow: poll-message-with-gpu-pending

  • Test files: main/app/__tests__/chat-screen-polling.test.tsx
  • Core files: main/app/lib/api/chats/pollForMessage.ts, main/app/app/chat.tsx

Type Definitions

PollForMessageInput {
  chat_id: string (required)
  message_id: string (required)
  intervalMs: number (initial polling interval, default 500ms)
  maxWaitMs: number (total timeout, auto-extended to 30min when gpu_pending likely)
  onGpuPending: () => void (optional callback; called once when gpu_pending detected)
  gpuPendingIntervalMs: number (slow polling interval when gpu_pending, default 30s)
}

PollForMessageOutput {
  message: ChatMessage (with ready=true and final content)
}

PollForMessageError {
  name: "PollTimeoutError"
  message: "Timed out waiting for message to be ready"
}

Paths

path-name input output/expected state change path-type notes updated
poll-message.ready-immediately PollForMessageInput with ready=true on first poll PollForMessageOutput with ready=true content; no callback fired happy path GPU available from start; answer returned by dispatcher or worker immediately Y
poll-message.gpu-pending-detected PollForMessageInput with gpu_pending=true returned on 2nd+ poll onGpuPending callback fired once; polling continues at 30s interval; no error thrown new path Detects gpu_pending=true, switches interval, calls callback, continues polling Y
poll-message.gpu-pending-resolves PollForMessageInput with gpu_pending=true then ready=true after 30s+ PollForMessageOutput with final answer content; callback fired once earlier new subpath After gpu_pending detected and callback fired, continues polling at 30s intervals until ready=true Y
poll-message.gpu-pending-timeout PollForMessageInput with gpu_pending=true for >30 minutes PollTimeoutError thrown; maxWaitMs (30min) exceeded error Backend GPU retry Lambda exceeds max attempts; user sees timeout error Y
poll-message.timeout-no-gpu-pending PollForMessageInput with ready=false, gpu_pending=false for >60s PollTimeoutError thrown; maxWaitMs (60s default) exceeded error Standard polling timeout when GPU is not involved
poll-message.backward-compatible PollForMessageInput without onGpuPending or gpuPendingIntervalMs Polling uses default intervals and ignores gpu_pending attribute legacy path Defaults applied; callback optional

Flow: chat-screen-gpu-pending-ui-update

  • Test files: main/app/__tests__/chat-screen-polling.test.tsx
  • Core files: main/app/app/chat.tsx

Type Definitions

ChatScreenGpuPendingInput {
  placeholderId: string (message_id from dispatcher response)
  placeholderContent: string (current empty or temporary content)
  onGpuPending: callback (fired when gpu_pending detected during polling)
}

ChatScreenGpuPendingOutput {
  placeholderUpdated: boolean (true when "Starting up..." message set)
  finalAnswerUpdated: boolean (true when actual answer replaces placeholder)
  userState: "loading-thinking" | "loading-gpu-pending" | "answer-ready"
}

Paths

path-name input output/expected state change path-type notes updated
chat-screen.gpu-pending-callback-fired onGpuPending() triggered from pollForMessage Message bubble text updated to "Starting up the GPU, we will notify you when we have an answer"; id stays same (placeholderId) new path Callback handler replaces empty bubble with "Starting up..." text; keyed to same placeholderId Y
chat-screen.gpu-pending-resolves-to-answer pollForMessage() resolves with ready=true content after gpu_pending Same placeholder (placeholderId) updated with final answer; isLoading → false new subpath Final answer replaces "Starting up..." text in same bubble Y
chat-screen.gpu-unavailable-error-path chatWithMemory() throws GpuUnavailableError Shows "Starting up the GPU, we will notify you when we have an answer"; no polling starts (different error path) existing path This path (synchronous dispatcher failure) is unchanged; polling does not start
chat-screen.gpu-pending-timeout-error pollForMessage() exceeds 30min timeout while gpu_pending Error message shown; isLoading → false; no retry UI error User sees generic error message; GPU retry Lambda gave up Y

3. Pseudocode / Technical Details

Critical Flow: poll-message-with-gpu-pending

pollForMessage(options):
  chat_id = options.chat_id
  message_id = options.message_id
  intervalMs = options.intervalMs ?? 500
  gpuPendingIntervalMs = options.gpuPendingIntervalMs ?? 30000
  onGpuPending = options.onGpuPending ?? null
  maxWaitMs = options.maxWaitMs ?? 60000

  // When gpu_pending is possible, extend timeout to 30 minutes
  if onGpuPending is not null:
    maxWaitMs = max(maxWaitMs, 1800000)  // 30 min

  deadline = now() + maxWaitMs
  gpuPendingCallbackFired = false
  currentIntervalMs = intervalMs

  poll():
    if now() >= deadline:
      throw PollTimeoutError()

    messages = GET /chats/messages?message_id=message_id
    msg = messages[0]

    if msg.ready === true:
      log("poll-message", "ready", {...})
      return msg

    // NEW: Detect gpu_pending and switch to slow polling
    if msg.gpu_pending === true and not gpuPendingCallbackFired:
      log("poll-message", "gpu-pending-detected", {...})
      gpuPendingCallbackFired = true
      if onGpuPending is not null:
        onGpuPending()  // Notify chat.tsx to update UI
      currentIntervalMs = gpuPendingIntervalMs

    log("poll-message", "not-ready", {ready: msg.ready, gpu_pending: msg.gpu_pending})

    if now() >= deadline:
      throw PollTimeoutError()

    sleep(currentIntervalMs)
    return poll()  // Continue with updated interval

  return poll()

Critical Flow: chat-screen-gpu-pending-callback

askQuestion(question):
  // ... send question, get message_id ...

  if status === "thinking" and message_id and chat_id:
    placeholderId = message_id
    setMessages([...prev, {id: placeholderId, content: "", role: "assistant"}])

    // NEW: Define callback before polling
    onGpuPendingCallback = () => {
      setMessages((prev) =>
        prev.map((m) =>
          m.id === placeholderId
            ? {...m, content: "Starting up the GPU, we will notify you when we have an answer"}
            : m
        )
      )
    }

    // Poll with callback and extended timeout
    readyMsg = await pollForMessage({
      chat_id,
      message_id,
      onGpuPending: onGpuPendingCallback,
      gpuPendingIntervalMs: 30000,
      maxWaitMs: 1800000  // 30 min
    })

    // When poll resolves, replace with final answer
    setMessages((prev) =>
      prev.map((m) =>
        m.id === placeholderId
          ? {...m, content: readyMsg.content}
          : m
      )
    )

Implementation Notes

  1. Backward Compatibility: onGpuPending is optional. If not provided, polling behaves as before (no gpu_pending detection, no interval switching).

  2. Callback Firing: The callback fires exactly once per poll session, triggered by the first detection of gpu_pending=true. A flag (gpuPendingCallbackFired) ensures it doesn't fire repeatedly.

  3. Interval Switching: Once gpu_pending=true is detected, the polling interval switches permanently from intervalMs (500ms) to gpuPendingIntervalMs (30s default). This reduces backend load during long retries.

  4. Timeout Extension: When onGpuPending is provided, maxWaitMs is automatically extended to at least 30 minutes (1800000ms) to match the backend GPU retry Lambda's max attempts (15 retries × 2-min intervals).

  5. Error Paths: The GpuUnavailableError path (synchronous dispatcher failure) is unchanged — no polling starts, no callback needed.

  6. Placeholder Keying: The same placeholderId (message_id from dispatcher) is used for both the initial empty bubble and the "Starting up..." text, and finally the resolved answer. This ensures a smooth visual transition in the same bubble position.

  7. Finally Block: The finally block in askQuestion() should not call setIsLoading(false) until pollForMessage() fully resolves. This is already correct in the current code.


4. Acceptance Criteria

The following tests validate the GPU pending polling behavior:

Test 1: gpu-pending-detected-callback-fired

Given: pollForMessage() with onGpuPending callback; backend returns gpu_pending=true on second poll Then: Callback fires exactly once; polling continues; no error thrown; interval switches to 30s

Test 2: gpu-pending-resolves-with-answer

Given: pollForMessage() detects gpu_pending=true, callback fires; after 30s+, backend returns ready=true with answer Then: Poll resolves with final message; chat screen updates placeholder bubble with answer

Test 3: gpu-pending-ui-update-flow

Given: Chat screen calls askQuestion(), dispatcher returns message_id, poll detects gpu_pending=true Then: Placeholder bubble created; onGpuPending callback fires; bubble text updates to "Starting up..."; user can see progress without reloading

Test 4: gpu-pending-timeout-exceeded

Given: Poll with gpu_pending=true for >30 minutes without ready=true Then: PollTimeoutError thrown; error message displayed; isLoading set to false

Test 5: backward-compatibility-no-callback

Given: pollForMessage() called without onGpuPending callback Then: Polling behavior unchanged; gpu_pending attribute ignored; standard 60s timeout applies

Test 6: gpu-unavailable-error-unchanged

Given: chatWithMemory() throws GpuUnavailableError (synchronous dispatcher failure) Then: No polling starts; "Starting up..." message shown in catch block; different from gpu_pending path

Test 7: gpu-pending-callback-fires-only-once

Given: Poll receives multiple messages with gpu_pending=true Then: Callback fired on first detection only; does not fire on subsequent polling rounds

Test 8: interval-switch-verified

Given: Poll starts with 500ms interval; detects gpu_pending=true Then: Subsequent polls use 30s interval; confirmed via logging or explicit timing checks


ONCE YOU GET APPROVAL FROM THE DEVELOPER FOR STAGES 1–3, DELETE THESE LINES AND UPDATE THE STAGE GATE TRACKER BELOW

Stage Gate Tracker (Updated)

  • [ ] Stage 1 Mermaid approved
  • [ ] Stage 2 I/O contracts approved
  • [ ] Stage 3 pseudocode/technical details approved or skipped