Skip to content

Bug: Memory Feed Shows Multiple Tiles for a Single Recording Session

Summary

The memory feed displays one tile per 30-second window segment even though all windows within a session share the same source_session_id. The PR #397 (feature/group-feed-by-session) fixed the feed API grouping logic correctly, but two code paths in the ingest layer write segments WITHOUT source_session_id, causing the feed's COALESCE(source_session_id, CAST(id AS VARCHAR)) group key to fall back to each segment's own UUID — so each segment appears as a separate tile.

Symptoms

  • A single 5-minute recording (10 windows) shows 10 tiles in the feed.
  • Each tile displays a different thumbnail corresponding to one 30-second window.
  • The bug reproduces consistently for sessions ingested via ingest_session.py (the offline batch pipeline) and any multiscale/semantic synthetic segments.
  • Sessions ingested via ingest_window.py (the Lambda path) group correctly because that path does pass source_session_id to create_segment.

Root Cause

Two call sites in ingest_session.py call create_segment without source_session_id:

Root Cause 1 — _ingest_window (line 189)

# ingest_session.py:189
segment_id = create_segment(
    user_id=user_id,
    start_time=start_iso,
    end_time=end_iso,
    duration_seconds=30,
    caption=caption,
    transcript=window_transcript or None,
    # source_session_id MISSING — defaults to None
)

source_session_id is never passed. Every window segment gets source_session_id=NULL in the DB. The feed's COALESCE falls back to CAST(id AS VARCHAR) for each segment, so each window becomes an independent group key and a separate feed tile.

Root Cause 2 — _run_multiscale (line 372) and _run_semantic_extraction (line 433)

# ingest_session.py:372
create_segment(
    user_id=user_id,
    start_time=...,
    end_time=...,
    duration_seconds=output_duration,
    caption=merged,
    # source_session_id MISSING
)

Merged/semantic segments also omit source_session_id, so they appear as separate tiles as well.

Non-issue confirmed

The ingest_window.py (Lambda) path correctly passes source_session_id=session_id at line 199. The re-delivery (existing) branch at line 177 reuses the existing row (which already has source_session_id) and doesn't need to write it again — this path is correct.

The feed API's session_key_expr.in_(visible_keys) filter is also correct: when a segment has source_session_id=NULL, the COALESCE produces the segment UUID as the key, and that UUID is what goes into visible_keys, so the second query retrieves it correctly (just as a singleton, not grouped).

Reproduction Test

tests/unit/test_ingest_session_missing_source_session_id.py

The test calls _ingest_window with a mock session and verifies that the created WorldMMSegment row has source_session_id set to the session ID (not NULL). It fails before the fix because create_segment is called without the argument.

Fix

Pass source_session_id in every create_segment call inside ingest_session.py:

  1. _ingest_window: add source_session_id parameter to the function signature and thread it through to create_segment.
  2. ingest_session caller: pass session_id from the metadata dict.
  3. _run_multiscale: add source_session_id parameter and pass it through.
  4. _run_semantic_extraction: add source_session_id parameter and pass it through.

Evidence

  • ingest_session.py:189create_segment call missing source_session_id.
  • ingest_session.py:372create_segment call missing source_session_id.
  • ingest_session.py:433create_segment call missing source_session_id.
  • ingest_window.py:199create_segment correctly passes source_session_id.
  • Feed API app.py:157 — COALESCE grouping is correct.

Verification

After the fix: - test_ingest_session_missing_source_session_id.py passes. - Existing test_memories_feed_session_grouping.py continues to pass. - Manual feed query for a multi-window session returns exactly 1 tile.

Status

Fixed. Root cause confirmed via failing reproduction test; fix applied; test now passes.