Skip to content

GPU Readiness Flow - Ambiguous "Waiting" Message and Unclear Status

Metadata

  • Date: 2026-05-09
  • Status: fixed
  • Severity: high
  • User Impact: Chat feature unreliable; users cannot tell GPU status or when to retry
  • Affected Components: Chat frontend, Chat Lambda, GPU worker resolution

Problem

Symptom: User receives "Chat is not available right now — the GPU worker is starting up. Please try again in a few minutes." with no clarity on: - Whether GPU is actually unavailable or just slow - How long to wait before retrying - What the actual GPU state is (stopped, starting, terminated, unhealthy) - Whether retrying immediately is futile or if it might succeed

Root Causes

  1. Opaque Fallback Message: The Lambda returns the same generic fallback for all GPU unavailable states.
  2. Fallback Saved to Chat: The fallback message is saved to DynamoDB as if it were a bot answer.
  3. No Retry Guidance: No indication of retry_after_seconds or reason code sent to frontend.
  4. Single Health Check: GPU resolution happens once per request.
  5. GPU State Not Tracked: The code doesn't distinguish between pending, stopped, running, and terminated states.
  6. Frontend Cannot Display Status: No UI component to show GPU warming up progress.

Solution

Backend (app.py)

  • Return structured gpu_status object with ready, reason, retry_after_seconds, and message
  • Modify _resolve_gpu_url() to return tuple with status reason
  • Do NOT save GPU unavailable message to DynamoDB
  • Add logging for each GPU stage

Frontend (chat.tsx)

  • Display GPU status as yellow centered box
  • Auto-retry for transient failures (up to 5 retries)
  • Show "Retrying automatically..." during retry
  • Reset retry count on success

Types (chatWithMemory.ts)

  • Export GPUStatus type with reason enum
  • ChatWithMemoryResponse.answer can be null for GPU unavailable

Tests

  • Add GPU status response tests
  • Update _resolve_gpu_url mocks to return tuple
  • Test that fallback is NOT saved to history

Files Modified

File Changes
main/server/api/memories/chat/app.py Structured GPU status, _get_gpu_status(), _build_gpu_unavailable_response()
main/app/app/chat.tsx GPU status display, auto-retry logic
main/app/lib/api/memory/chatWithMemory.ts GPUStatus type, response shape
Tests GPU status tests, updated mocks

Behavior Examples

Before

User: "What did I do?"
→ [waiting 60s...]
← "Chat is not available right now — the GPU worker is starting up..."
  (saved to chat history)

After

User: "What did I do?"
→ [waiting 1s...]
← {
  "answer": null,
  "gpu_status": {
    "ready": false,
    "reason": "starting",
    "retry_after_seconds": 10,
    "message": "GPU is starting up. Please try again in 10 seconds."
  }
}
(displayed as yellow box, auto-retries, NOT saved)

Verification

  • [x] Backend returns structured GPU status
  • [x] Frontend displays GPU status with user-friendly message
  • [x] Automatic retry works for transient failures
  • [x] Fallback NOT saved to chat history
  • [x] Tests updated for GPU status responses
  • [x] Edge cases handled

Audit Log

ID Action Note
1 Investigation Identified opaque fallback message as root cause
2 Design Structured GPU status response with reason codes
3 Backend Implementation Enhanced GPU status tracking and response
4 Frontend Implementation GPU status display and auto-retry
5 Test Updates GPU status and retry behavior tests