Memory Fetch and Display

Plan Metadata

Plan type: plan
Parent plan: N/A
Depends on: N/A
Status: documentation

System Intent

What is being built: The end-to-end flow that stores processed memory segments in PostgreSQL, serves them through a paginated feed API with presigned S3 image URLs, and renders them as a 2-column photo grid with a full-screen viewer in the mobile app.
Primary consumer(s): Home screen (app/app/index.tsx), MemoryFeed, MemoryCard, MemoryViewerModal components.
Boundary: WorldMM pipeline writes memory segments (captions, frame S3 keys, knowledge graph triples, visual embeddings) to PostgreSQL → feed API paginates segments and generates presigned S3 URLs → mobile app fetches via infinite query → renders grid → user taps to open full-screen viewer.

Stage Gate Tracker

[x] Stage 1 Mermaid approved
[x] Stage 2 I/O contracts approved
[x] Stage 3 pseudocode/technical details approved

Revision: renamed url → thumbnail (now optional), added type: "text" | "audio" | "visual" field to MemoryFeedItem.

1. Mermaid Diagram

flowchart TD
    subgraph PIPELINE["WorldMM Ingest Pipeline"]
        GPU["GPU Worker EC2\ngpu_worker/server.py"]:::unchanged
        INGEST["ingest_window.py\nworldmm/pipeline/ingest_window.py"]:::unchanged
    end

    subgraph DB["PostgreSQL + pgvector"]
        SEG[("worldmm_segments\nid, user_id, start_time, end_time,\ncaption, s3_frames_key, transcript,\nsource_session_id, source_window_index")]:::unchanged
        ENT[("worldmm_entities\nid, user_id, surface_form,\ncanonical_name, embedding_json")]:::unchanged
        TRP[("worldmm_triples\nid, segment_id, user_id,\nmemory_type, subject, predicate,\nobject, invalidated_at")]:::unchanged
        EMB[("worldmm_visual_embeddings\nid, segment_id, user_id,\ntimestamp, embedding_json")]:::unchanged
        ORM["worldmm_orm.py\nshared/orm/worldmm_orm.py"]:::unchanged
    end

    subgraph S3["S3"]
        FRAMES[("Frame images\nsessions/session_id/window_NNN/frame_000.jpg")]:::unchanged
    end

    subgraph FEED_API["Feed Lambda"]
        FEED["feed app.py\napi/memories/feed/app.py"]:::unchanged
    end

    subgraph APP["Mobile App"]
        HOOK["useMemoriesFeed\nlib/api/memory/useMemoryApi.ts"]:::unchanged
        LIST["listMemories\nlib/api/memory/listMemories.ts"]:::unchanged
        SCREEN["index.tsx\napp/app/index.tsx"]:::unchanged
        MFEED["MemoryFeed\ncomponents/memory/MemoryFeed.tsx"]:::unchanged
        MCARD["MemoryCard\ncomponents/memory/MemoryCard.tsx"]:::unchanged
        MROW["MemoryRow\ncomponents/memory/MemoryRow.tsx"]:::unchanged
        MODAL["MemoryViewerModal\ncomponents/memory/memory-viewer-modal.tsx"]:::unchanged
    end

    GPU -->|"caption, NER entities, triples, 1536-dim visual embedding"| INGEST
    INGEST -->|"create_segment, create_entity, create_triple"| ORM
    INGEST -->|"store_visual_embedding"| ORM
    ORM -->|"SQL INSERT"| SEG
    ORM -->|"SQL INSERT"| ENT
    ORM -->|"SQL INSERT"| TRP
    ORM -->|"SQL INSERT"| EMB
    INGEST -->|"frame_000.jpg at s3_frames_key"| FRAMES

    SCREEN -->|"infinite query"| HOOK
    HOOK -->|"cursor + limit"| LIST
    LIST -->|"POST /memories/feed"| FEED
    FEED -->|"SELECT worldmm_segments ORDER BY start_time DESC LIMIT n+1"| ORM
    ORM -->|"segment rows"| FEED
    FEED -->|"presign s3_frames_key"| S3
    S3 -->|"presigned URL string"| FEED
    FEED -->|"MemoryFeedItem[] + next_cursor"| LIST
    LIST -->|"ListMemoriesResponse"| HOOK
    HOOK -->|"flattened pages of MemoryFeedItem[]"| SCREEN
    SCREEN -->|"memories + pagination callbacks"| MFEED
    MFEED -->|"type=visual MemoryFeedItem"| MCARD
    MFEED -->|"type=audio or text MemoryFeedItem"| MROW
    MCARD -->|"onPress index"| SCREEN
    MROW -->|"onPress index"| SCREEN
    SCREEN -->|"memories + initialIndex"| MODAL

classDef unchanged fill:#d3d3d3,stroke:#666,stroke-width:1px
classDef updated fill:#ffe58a,stroke:#666,stroke-width:1px
classDef deleted fill:#f4a6a6,stroke:#666,stroke-width:1px
classDef created fill:#a8e6a3,stroke:#666,stroke-width:1px

2. Black-Box Inputs and Outputs

Global Types

MemoryFeedItem {
  id:        string                        (UUID — worldmm_segments PK)
  time:      string                        (ISO 8601 — segment start_time)
  type:      "text" | "audio" | "visual"   (NEW — derived from segment content; see type derivation rules below)
  thumbnail?: string                       (CHANGED from url — presigned S3 URL, 1hr TTL; omitted if no frame stored)
  featured:  boolean                       (always false currently)
}

ListMemoriesResponse {
  memories:    MemoryFeedItem[]
  next_cursor: string | null    (string offset; null = no more pages)
}

Type derivation rules (server-side, no new DB columns needed):

condition	`type` value
`s3_frames_key` is non-null	`"visual"`
`s3_frames_key` is null and `transcript` is non-null	`"audio"`
`s3_frames_key` is null and `transcript` is null	`"text"`

Pipeline Output: what the ingest pipeline writes

The pipeline is a black box to the feed system. Its only observable output is rows in these four tables:

worldmm_segments — one row per 30-second window

column	type	notes
`id`	string UUID	PK
`user_id`	string	owner
`start_time`	string ISO 8601	window start
`end_time`	string ISO 8601	window end
`duration_seconds`	int	always 30
`caption`	text \| null	GPU-generated description of the window
`s3_frames_key`	string \| null	S3 path to representative frame (`sessions/{session_id}/window_{index:03d}/frame_000.jpg`)
`transcript`	text \| null	speech-to-text from window audio
`source_session_id`	string \| null	originating session ID
`source_window_index`	int \| null	0-based index within session

Unique constraint: (user_id, source_session_id, source_window_index) — prevents duplicate ingestion.

worldmm_entities — knowledge graph nodes extracted from captions

column	type	notes
`id`	string UUID	PK
`user_id`	string	owner
`surface_form`	string	raw text from caption (e.g. "John")
`canonical_name`	string \| null	normalized form (e.g. "John Smith")
`embedding_json`	text \| null	1536-dim vector as JSON

worldmm_triples — episodic and semantic facts

column	type	notes
`id`	string UUID	PK
`segment_id`	string UUID FK	links to `worldmm_segments`
`user_id`	string	owner
`memory_type`	string	`"episodic"` or `"semantic"`
`subject_entity_id`	string UUID FK	subject entity
`predicate`	string	relation (e.g. `"is_doing"`, `"is_at"`)
`object_entity_id`	string UUID FK \| null	object as entity (mutually exclusive with literal)
`object_literal`	string \| null	object as literal value (mutually exclusive with entity)
`invalidated_at`	string ISO 8601 \| null	soft-delete; NULL = active

worldmm_visual_embeddings — per-window semantic search index

column	type	notes
`id`	string UUID	PK
`segment_id`	string UUID FK	links to `worldmm_segments`
`user_id`	string	owner
`timestamp`	string ISO 8601	frame capture time
`embedding_json`	text \| null	1536-dim vector as JSON (pgvector on PostgreSQL)

Flow: `listMemories` — paginated memory feed

Test files: main/server/tests/integration/test_memories_feed_pagination.py

Request (POST /memories/feed)

{
  cursor?: string | null  (string integer offset; null or omit for first page)
  limit?:  number         (default 20, max 50)
}

Response

{
  memories:    MemoryFeedItem[]
  next_cursor: string | null
}

path-name	input	output	path-type
`feed.first-page`	valid JWT, no cursor	first `limit` items newest-first, `next_cursor` set if more exist	happy path
`feed.paginated`	valid JWT + cursor	next page of items, `next_cursor` null on last page	happy path
`feed.empty`	valid JWT, no segments	`{ memories: [], next_cursor: null }`	subpath
`feed.no-thumbnail`	segment has no `s3_frames_key`	item returned without `thumbnail` field, `type` = `"audio"` or `"text"`	subpath
`feed.type-visual`	segment has `s3_frames_key` non-null	`type: "visual"`, `thumbnail` present	subpath
`feed.type-audio`	no `s3_frames_key`, `transcript` non-null	`type: "audio"`, no `thumbnail`	subpath
`feed.type-text`	no `s3_frames_key`, no `transcript`	`type: "text"`, no `thumbnail`	subpath
`feed.unauthenticated`	no JWT	401	error
`feed.user-isolation`	valid JWT user A	never returns segments owned by user B	security

Ordering: DESC(start_time), DESC(id) — newest segments first. Ties broken by segment UUID. Pagination: server fetches limit + 1 rows; if count exceeds limit, sets next_cursor = str(cursor + limit). Presigned URLs: generated per item via boto3 at response time, TTL = 3600s. Null s3_frames_key → thumbnail field omitted entirely.

3. Technical Details

Display: how `MemoryFeedItem[]` becomes UI

index.tsx
  └─ useMemoriesFeed({ limit: 20 })        ← React Query infinite query
       └─ listMemories({ cursor, limit })   ← POST /memories/feed
  └─ sortedMemories                         ← client-side sort newest-first over all pages
  └─ <MemoryFeed memories={sortedMemories}> ← mixed layout FlatList
       └─ type = "visual"  → <MemoryCard>  ← tile in 2-column grid, renders memory.thumbnail
       └─ type = "audio"   → <MemoryRow>   ← full-width row, shows transcript/audio indicator
       └─ type = "text"    → <MemoryRow>   ← full-width row, shows caption text
            onPress(index) → setViewerIndex(index)
  └─ <MemoryViewerModal
         memories={sortedMemories}
         initialIndex={viewerIndex}>        ← horizontal paged FlatList, full-screen
         image source = memory.thumbnail
         fallback = placeholder if thumbnail absent (type = "text" or "audio")

MemoryFeed layout rules: - FlatList with mixed layout — render mode determined by type field - type: "visual" → <MemoryCard> tile in 2-column grid (same as before) - type: "audio" → <MemoryRow> full-width row showing transcript or audio indicator - type: "text" → <MemoryRow> full-width row showing caption text - featured: true visual items span full width (currently unused) - Infinite scroll: onEndReachedThreshold={0.4} triggers fetchNextPage() - Pull-to-refresh: calls refetch()

MemoryViewerModal: - Horizontal FlatList with pagingEnabled - Full-screen image from memory.thumbnail (absent for text/audio type — show placeholder) - Close button with safe-area inset

Security Invariants

user_id always sourced from JWT — never trusted from request body
Feed query always filters by user_id — no cross-user data leakage
Presigned URLs are time-scoped (1hr) and generated server-side per request