Skip to content

Bug: Audio Session Missing Subsequent Segment Transcriptions

Metadata

  • Date reported: 2026-05-25
  • Status: Fixed
  • Root cause: .first() query returning only the first segment instead of all
  • Affected files: main/server/api/memories/transcript/app.py
  • Affected components: Transcript API, Transcript Display
  • Symptom: Only the first segment's transcript appears; all subsequent segments are missing

Symptom

When a user views an audio session in the mobile app, the TranscriptDisplay component shows only the first ~30-second segment's transcript. All subsequent segments (windows) of the session have no transcription visible, even though they were transcribed during ingest.

Example: - Session has 3 audio windows (90 seconds total) - Window 0: "Hello, how are you?" ✓ (visible) - Window 1: "I'm doing well thanks." ✗ (missing) - Window 2: "Let's grab lunch soon." ✗ (missing)

Root Cause

The transcript API endpoint (/memories/{memory_id}/transcript) uses .first() to retrieve transcripts:

# BUGGY CODE:
segs = (
    db.query(WorldMMSegment)
    .filter(
        WorldMMSegment.user_id == user_id,
        WorldMMSegment.source_session_id == memory_id,
        WorldMMSegment.transcript.isnot(None),
    )
    .order_by(WorldMMSegment.start_time, WorldMMSegment.id)
    .first()  # ← Returns only the first matching segment
)

When a session is grouped (multiple windows sharing the same source_session_id), the query returns a list of matching segments but only extracts the first one. The remaining segments are discarded.

Evidence

  1. Schema: WorldMMSegment has source_window_index field, indicating a session can have multiple segments:
  2. source_session_id: "abc-123"
  3. source_window_index: 0, 1, 2 (three windows)

  4. Ingest pipeline: ingest_window.py correctly creates one segment per window with the same source_session_id

  5. Feed API: api/memories/feed/app.py retrieves all segments for a session and concatenates transcripts:

    for seg in session_segments:
        if seg.transcript:
            raw = seg.transcript.strip()
    

  6. Transcript API inconsistency: Returns only the first segment, contradicting the feed's multi-segment support

Fix

Changed .first() to .all() and concatenate all transcripts in chronological order:

# FIXED CODE:
segs = (
    db.query(WorldMMSegment)
    .filter(
        WorldMMSegment.user_id == user_id,
        WorldMMSegment.source_session_id == memory_id,
        WorldMMSegment.transcript.isnot(None),
    )
    .order_by(WorldMMSegment.start_time, WorldMMSegment.id)
    .all()  # ← Returns all matching segments
)

# Concatenate all transcripts in order
full_transcript = " ".join(
    seg.transcript.strip() for seg in segs if seg.transcript
)

Files Changed

  • main/server/api/memories/transcript/app.py
  • _find_transcript_segment(): Changed from returning single segment to returning list
  • implementation(): Updated to concatenate all transcripts and use first segment's metadata

Verification

  1. Query a session with multiple transcribed windows
  2. Call GET /memories/{session_id}/transcript
  3. Verify response contains concatenated text from all windows, in chronological order
  4. Spot check: first window's phrase + second window's phrase should both appear in response
  • Transcript display: main/app/components/memory/TranscriptDisplay.tsx
  • Feed API: main/server/api/memories/feed/app.py (already supports multi-window sessions)
  • Ingest pipeline: main/server/worldmm/pipeline/ingest_window.py (correctly creates per-window segments)

Follow-up

Consider adding integration test to catch regressions in transcript concatenation.