Costco Context Missing From Chat — Audio-Only Segments Never Vectorized
Metadata
- Date:
2026-05-27 - Status:
resolved - Severity:
high - User:
a408c4d8-60d1-7085-0e89-8b28e7102455(benjaminsl2000@gmail.com) - Related Issues:
docs/bugs/2026-05-07-chat-no-context-gpu-unavailable.md - Owner: debugger-agent
Symptom
User asked: "Does it have anything about Costco? Can you tell me..." in chat.
The GPU was healthy, the ReasoningAgent ran successfully, but returned a 57-character answer — too short to contain any Costco context. The ORM has segment records for two Costco sessions but they were completely invisible to all three retrievers.
Log Evidence
CloudWatch: /aws/lambda/server-MemoriesChatFunction-OkmZYszwOXzJ
memories_chat chat_dispatcher_start question_preview="Does it have anything about Costco? Can you tell m" ...
memories_chat chat_impl_start ...
memories_chat gpu_retry_healthy elapsed_ms=185 retries=0
memories_chat chat_gpu_resolved gpu_instance_id="i-0136563e6b3c049ec" ...
chat agent_answer_complete answer_length=57
GPU was healthy (185ms health check) and the answer was only 57 chars — a "no context" fallback answer from the LLM.
CloudWatch: /aws/lambda/server-IngestWindowFunction-44v8BXyEGwOz
ingest_window completed enriched=false segment_id="172176d8-8abd-49c5-8e8f-434daad7509e" title="Heading to Costco"
ingest_window completed enriched=false segment_id="e413ac1a-64ee-4990-b091-c1cb3ea0cc38" title="Discussing Costco membership and food"
Both Costco segments completed with enriched=false — the audio-only path was taken for both.
Root Cause
The ingest_window.lambda_handler takes two distinct paths based on whether frames are present:
Path A (with frames, GPU path): 1. Calls GPU: caption → NER → entities → episodic triples → visual embedding 2. Segment ends up with: caption, entities, WorldMMTriple records (memory_type="episodic"), visual embeddings 3. All three retrievers can find it: episodic (via triples + PPR), semantic (via semantic triples built over time), visual (via pgvector embedding search)
Path B (audio-only, enriched=false): 1. Transcribes audio with Groq Whisper 2. Generates a title from the transcript 3. Calls update_segment(processing_status="complete", title=..., transcript=...) 4. No GPU call, no NER, no entities, no triples, no embeddings 5. Segment has only transcript text in the DB
Result: Audio-only segments are completely invisible to all three retrievers: - Episodic retriever (retrieve_episodic_with_embeddings): builds entity graphs from WorldMMTriple records — audio-only segments have no triples, so they never appear in any graph - Semantic retriever (retrieve_semantic_with_embeddings): searches semantic triples — same issue, no triples exist - Visual retriever (retrieve_visual): searches worldmm_visual_embeddings — no embedding was ever stored
The load_episodic_graphs query in db_loader.py filters on WorldMMTriple.memory_type == "episodic". If no triples exist for a segment (which is the case for all audio-only segments), it is entirely absent from the graph loaded by handle_chat.
The ORM record exists (WorldMMSegment with transcript text and processing_status="complete"), so from an ORM standpoint the memory "exists" — but it has no vector representation and no graph edges, making it unreachable by any retriever.
Hypothesis Verdict
- Hypothesis 1 (vector DB search didn't find it): TRUE — the vector DB has no embedding for these segments because they were never vectorized
- Hypothesis 2 (audio input never vectorized): TRUE — the audio-only ingest path
enriched=falseintentionally skips all GPU enrichment (captions, NER, triples, embeddings)
Both hypotheses are correct: the embedding was never stored, and therefore the vector search cannot find it. The deeper root cause is that the episodic retriever has no transcript-based fallback path for segments that have text but no GPU enrichment.
Fix
The fix adds transcript-embedding as a fallback for audio-only segments in the ingest_window path.
When enriched=false (audio-only window with a non-empty transcript), after saving the segment the Lambda also: 1. Calls TextEmbedder (via Groq, no GPU needed) to embed the transcript text 2. Stores the resulting text embedding as a WorldMMEntity for the segment's transcript content, creating a synthetic entity named after the title/session 3. Creates a WorldMMTriple (memory_type="episodic") that links the synthetic entity to the segment, making the segment visible to the episodic graph loader
File changed: main/server/worldmm/pipeline/ingest_window.py
The key change is in lambda_handler at the enriched=false early return block (around line 638):
# Before (audio-only early return, no vectorization):
if not frames_b64:
update_segment(segment_id, processing_status="complete", title=title, transcript=transcript or None)
return {"status": "ok", "segmentId": segment_id, "enriched": False}
# After (audio-only path now embeds transcript and creates synthetic triple):
if not frames_b64:
update_segment(segment_id, processing_status="complete", title=title, transcript=transcript or None)
if transcript:
_embed_and_index_transcript(segment_id, user_id, transcript, title)
return {"status": "ok", "segmentId": segment_id, "enriched": False}
New helper _embed_and_index_transcript uses Groq text embeddings (not GPU) to create a synthetic entity + episodic triple, making the segment discoverable via the episodic retriever's seed-entity cosine similarity path.
Test
File: main/server/tests/unit/test_ingest_window_audio_transcript_index.py
Test confirms: 1. Audio-only window with transcript calls _embed_and_index_transcript 2. A WorldMMEntity and WorldMMTriple (memory_type="episodic") are created 3. The triple links the entity to the segment_id 4. The test fails before the fix (no triples created) and passes after
Verification
After the fix, a retrigger of the two affected sessions via RetriggerIngestFunction with force=false (segments are already "complete" — need force=true to re-enrich) would cause the Lambda to re-run and call _embed_and_index_transcript, creating synthetic triples and making the Costco segments visible to the episodic retriever.
Retrigger command (to be run manually after deploy):
{
"sessionId": "<costco-session-id>",
"userId": "a408c4d8-60d1-7085-0e89-8b28e7102455",
"force": true
}
The two segment IDs are: - 172176d8-8abd-49c5-8e8f-434daad7509e ("Heading to Costco") - e413ac1a-64ee-4990-b091-c1cb3ea0cc38 ("Discussing Costco membership and food")