GPU Readiness Flow - Ambiguous "Waiting" Message and Unclear Status
- Date:
2026-05-09 - Status:
fixed - Severity:
high - User Impact: Chat feature unreliable; users cannot tell GPU status or when to retry
- Affected Components: Chat frontend, Chat Lambda, GPU worker resolution
Problem
Symptom: User receives "Chat is not available right now — the GPU worker is starting up. Please try again in a few minutes." with no clarity on: - Whether GPU is actually unavailable or just slow - How long to wait before retrying - What the actual GPU state is (stopped, starting, terminated, unhealthy) - Whether retrying immediately is futile or if it might succeed
Root Causes
- Opaque Fallback Message: The Lambda returns the same generic fallback for all GPU unavailable states.
- Fallback Saved to Chat: The fallback message is saved to DynamoDB as if it were a bot answer.
- No Retry Guidance: No indication of
retry_after_seconds or reason code sent to frontend. - Single Health Check: GPU resolution happens once per request.
- GPU State Not Tracked: The code doesn't distinguish between pending, stopped, running, and terminated states.
- Frontend Cannot Display Status: No UI component to show GPU warming up progress.
Solution
Backend (app.py)
- Return structured
gpu_status object with ready, reason, retry_after_seconds, and message - Modify
_resolve_gpu_url() to return tuple with status reason - Do NOT save GPU unavailable message to DynamoDB
- Add logging for each GPU stage
Frontend (chat.tsx)
- Display GPU status as yellow centered box
- Auto-retry for transient failures (up to 5 retries)
- Show "Retrying automatically..." during retry
- Reset retry count on success
Types (chatWithMemory.ts)
- Export
GPUStatus type with reason enum ChatWithMemoryResponse.answer can be null for GPU unavailable
Tests
- Add GPU status response tests
- Update
_resolve_gpu_url mocks to return tuple - Test that fallback is NOT saved to history
Files Modified
| File | Changes |
main/server/api/memories/chat/app.py | Structured GPU status, _get_gpu_status(), _build_gpu_unavailable_response() |
main/app/app/chat.tsx | GPU status display, auto-retry logic |
main/app/lib/api/memory/chatWithMemory.ts | GPUStatus type, response shape |
| Tests | GPU status tests, updated mocks |
Behavior Examples
Before
User: "What did I do?"
→ [waiting 60s...]
← "Chat is not available right now — the GPU worker is starting up..."
(saved to chat history)
After
User: "What did I do?"
→ [waiting 1s...]
← {
"answer": null,
"gpu_status": {
"ready": false,
"reason": "starting",
"retry_after_seconds": 10,
"message": "GPU is starting up. Please try again in 10 seconds."
}
}
(displayed as yellow box, auto-retries, NOT saved)
Verification
- [x] Backend returns structured GPU status
- [x] Frontend displays GPU status with user-friendly message
- [x] Automatic retry works for transient failures
- [x] Fallback NOT saved to chat history
- [x] Tests updated for GPU status responses
- [x] Edge cases handled
Audit Log
| ID | Action | Note |
| 1 | Investigation | Identified opaque fallback message as root cause |
| 2 | Design | Structured GPU status response with reason codes |
| 3 | Backend Implementation | Enhanced GPU status tracking and response |
| 4 | Frontend Implementation | GPU status display and auto-retry |
| 5 | Test Updates | GPU status and retry behavior tests |