Skip to content

Costco Context Missing From Chat — Audio-Only Segments Never Vectorized

Metadata

  • Date: 2026-05-27
  • Status: resolved
  • Severity: high
  • User: a408c4d8-60d1-7085-0e89-8b28e7102455 (benjaminsl2000@gmail.com)
  • Related Issues: docs/bugs/2026-05-07-chat-no-context-gpu-unavailable.md
  • Owner: debugger-agent

Symptom

User asked: "Does it have anything about Costco? Can you tell me..." in chat.

The GPU was healthy, the ReasoningAgent ran successfully, but returned a 57-character answer — too short to contain any Costco context. The ORM has segment records for two Costco sessions but they were completely invisible to all three retrievers.

Log Evidence

CloudWatch: /aws/lambda/server-MemoriesChatFunction-OkmZYszwOXzJ

memories_chat chat_dispatcher_start question_preview="Does it have anything about Costco? Can you tell m" ...
memories_chat chat_impl_start ...
memories_chat gpu_retry_healthy elapsed_ms=185 retries=0
memories_chat chat_gpu_resolved gpu_instance_id="i-0136563e6b3c049ec" ...
chat agent_answer_complete answer_length=57

GPU was healthy (185ms health check) and the answer was only 57 chars — a "no context" fallback answer from the LLM.

CloudWatch: /aws/lambda/server-IngestWindowFunction-44v8BXyEGwOz

ingest_window completed enriched=false segment_id="172176d8-8abd-49c5-8e8f-434daad7509e" title="Heading to Costco"
ingest_window completed enriched=false segment_id="e413ac1a-64ee-4990-b091-c1cb3ea0cc38" title="Discussing Costco membership and food"

Both Costco segments completed with enriched=false — the audio-only path was taken for both.

Root Cause

The ingest_window.lambda_handler takes two distinct paths based on whether frames are present:

Path A (with frames, GPU path): 1. Calls GPU: caption → NER → entities → episodic triples → visual embedding 2. Segment ends up with: caption, entities, WorldMMTriple records (memory_type="episodic"), visual embeddings 3. All three retrievers can find it: episodic (via triples + PPR), semantic (via semantic triples built over time), visual (via pgvector embedding search)

Path B (audio-only, enriched=false): 1. Transcribes audio with Groq Whisper 2. Generates a title from the transcript 3. Calls update_segment(processing_status="complete", title=..., transcript=...) 4. No GPU call, no NER, no entities, no triples, no embeddings 5. Segment has only transcript text in the DB

Result: Audio-only segments are completely invisible to all three retrievers: - Episodic retriever (retrieve_episodic_with_embeddings): builds entity graphs from WorldMMTriple records — audio-only segments have no triples, so they never appear in any graph - Semantic retriever (retrieve_semantic_with_embeddings): searches semantic triples — same issue, no triples exist - Visual retriever (retrieve_visual): searches worldmm_visual_embeddings — no embedding was ever stored

The load_episodic_graphs query in db_loader.py filters on WorldMMTriple.memory_type == "episodic". If no triples exist for a segment (which is the case for all audio-only segments), it is entirely absent from the graph loaded by handle_chat.

The ORM record exists (WorldMMSegment with transcript text and processing_status="complete"), so from an ORM standpoint the memory "exists" — but it has no vector representation and no graph edges, making it unreachable by any retriever.

Hypothesis Verdict

  • Hypothesis 1 (vector DB search didn't find it): TRUE — the vector DB has no embedding for these segments because they were never vectorized
  • Hypothesis 2 (audio input never vectorized): TRUE — the audio-only ingest path enriched=false intentionally skips all GPU enrichment (captions, NER, triples, embeddings)

Both hypotheses are correct: the embedding was never stored, and therefore the vector search cannot find it. The deeper root cause is that the episodic retriever has no transcript-based fallback path for segments that have text but no GPU enrichment.

Fix

The fix adds transcript-embedding as a fallback for audio-only segments in the ingest_window path.

When enriched=false (audio-only window with a non-empty transcript), after saving the segment the Lambda also: 1. Calls TextEmbedder (via Groq, no GPU needed) to embed the transcript text 2. Stores the resulting text embedding as a WorldMMEntity for the segment's transcript content, creating a synthetic entity named after the title/session 3. Creates a WorldMMTriple (memory_type="episodic") that links the synthetic entity to the segment, making the segment visible to the episodic graph loader

File changed: main/server/worldmm/pipeline/ingest_window.py

The key change is in lambda_handler at the enriched=false early return block (around line 638):

# Before (audio-only early return, no vectorization):
if not frames_b64:
    update_segment(segment_id, processing_status="complete", title=title, transcript=transcript or None)
    return {"status": "ok", "segmentId": segment_id, "enriched": False}

# After (audio-only path now embeds transcript and creates synthetic triple):
if not frames_b64:
    update_segment(segment_id, processing_status="complete", title=title, transcript=transcript or None)
    if transcript:
        _embed_and_index_transcript(segment_id, user_id, transcript, title)
    return {"status": "ok", "segmentId": segment_id, "enriched": False}

New helper _embed_and_index_transcript uses Groq text embeddings (not GPU) to create a synthetic entity + episodic triple, making the segment discoverable via the episodic retriever's seed-entity cosine similarity path.

Test

File: main/server/tests/unit/test_ingest_window_audio_transcript_index.py

Test confirms: 1. Audio-only window with transcript calls _embed_and_index_transcript 2. A WorldMMEntity and WorldMMTriple (memory_type="episodic") are created 3. The triple links the entity to the segment_id 4. The test fails before the fix (no triples created) and passes after

Verification

After the fix, a retrigger of the two affected sessions via RetriggerIngestFunction with force=false (segments are already "complete" — need force=true to re-enrich) would cause the Lambda to re-run and call _embed_and_index_transcript, creating synthetic triples and making the Costco segments visible to the episodic retriever.

Retrigger command (to be run manually after deploy):

{
  "sessionId": "<costco-session-id>",
  "userId": "a408c4d8-60d1-7085-0e89-8b28e7102455",
  "force": true
}

The two segment IDs are: - 172176d8-8abd-49c5-8e8f-434daad7509e ("Heading to Costco") - e413ac1a-64ee-4990-b091-c1cb3ea0cc38 ("Discussing Costco membership and food")