GPU Readiness Flow - Ambiguous "Waiting" Message and Unclear Status

Metadata

Date: 2026-05-09
Status: fixed
Severity: high
User Impact: Chat feature unreliable; users cannot tell GPU status or when to retry
Affected Components: Chat frontend, Chat Lambda, GPU worker resolution

Problem

Symptom: User receives "Chat is not available right now — the GPU worker is starting up. Please try again in a few minutes." with no clarity on: - Whether GPU is actually unavailable or just slow - How long to wait before retrying - What the actual GPU state is (stopped, starting, terminated, unhealthy) - Whether retrying immediately is futile or if it might succeed

Root Causes

Opaque Fallback Message: The Lambda returns the same generic fallback for all GPU unavailable states.
Fallback Saved to Chat: The fallback message is saved to DynamoDB as if it were a bot answer.
No Retry Guidance: No indication of retry_after_seconds or reason code sent to frontend.
Single Health Check: GPU resolution happens once per request.
GPU State Not Tracked: The code doesn't distinguish between pending, stopped, running, and terminated states.
Frontend Cannot Display Status: No UI component to show GPU warming up progress.

Solution

Backend (app.py)

Return structured gpu_status object with ready, reason, retry_after_seconds, and message
Modify _resolve_gpu_url() to return tuple with status reason
Do NOT save GPU unavailable message to DynamoDB
Add logging for each GPU stage

Frontend (chat.tsx)

Display GPU status as yellow centered box
Auto-retry for transient failures (up to 5 retries)
Show "Retrying automatically..." during retry
Reset retry count on success

Types (chatWithMemory.ts)

Export GPUStatus type with reason enum
ChatWithMemoryResponse.answer can be null for GPU unavailable

Tests

Add GPU status response tests
Update _resolve_gpu_url mocks to return tuple
Test that fallback is NOT saved to history

Files Modified

File	Changes
`main/server/api/memories/chat/app.py`	Structured GPU status, `_get_gpu_status()`, `_build_gpu_unavailable_response()`
`main/app/app/chat.tsx`	GPU status display, auto-retry logic
`main/app/lib/api/memory/chatWithMemory.ts`	`GPUStatus` type, response shape
Tests	GPU status tests, updated mocks

Behavior Examples

Before

User: "What did I do?"
→ [waiting 60s...]
← "Chat is not available right now — the GPU worker is starting up..."
  (saved to chat history)

After

User: "What did I do?"
→ [waiting 1s...]
← {
  "answer": null,
  "gpu_status": {
    "ready": false,
    "reason": "starting",
    "retry_after_seconds": 10,
    "message": "GPU is starting up. Please try again in 10 seconds."
  }
}
(displayed as yellow box, auto-retries, NOT saved)

Verification

[x] Backend returns structured GPU status
[x] Frontend displays GPU status with user-friendly message
[x] Automatic retry works for transient failures
[x] Fallback NOT saved to chat history
[x] Tests updated for GPU status responses
[x] Edge cases handled

Audit Log

ID	Action	Note
1	Investigation	Identified opaque fallback message as root cause
2	Design	Structured GPU status response with reason codes
3	Backend Implementation	Enhanced GPU status tracking and response
4	Frontend Implementation	GPU status display and auto-retry
5	Test Updates	GPU status and retry behavior tests