Skip to content

Memory Fetch and Display

Plan Metadata

  • Plan type: plan
  • Parent plan: N/A
  • Depends on: N/A
  • Status: documentation

System Intent

  • What is being built: The end-to-end flow that stores processed memory segments in PostgreSQL, serves them through a paginated feed API with presigned S3 image URLs, and renders them as a 2-column photo grid with a full-screen viewer in the mobile app.
  • Primary consumer(s): Home screen (app/app/index.tsx), MemoryFeed, MemoryCard, MemoryViewerModal components.
  • Boundary: WorldMM pipeline writes memory segments (captions, frame S3 keys, knowledge graph triples, visual embeddings) to PostgreSQL → feed API paginates segments and generates presigned S3 URLs → mobile app fetches via infinite query → renders grid → user taps to open full-screen viewer.

Stage Gate Tracker

  • [x] Stage 1 Mermaid approved
  • [x] Stage 2 I/O contracts approved
  • [x] Stage 3 pseudocode/technical details approved

Revision: renamed urlthumbnail (now optional), added type: "text" | "audio" | "visual" field to MemoryFeedItem.

1. Mermaid Diagram

flowchart TD
    subgraph PIPELINE["WorldMM Ingest Pipeline"]
        GPU["GPU Worker EC2\ngpu_worker/server.py"]:::unchanged
        INGEST["ingest_window.py\nworldmm/pipeline/ingest_window.py"]:::unchanged
    end

    subgraph DB["PostgreSQL + pgvector"]
        SEG[("worldmm_segments\nid, user_id, start_time, end_time,\ncaption, s3_frames_key, transcript,\nsource_session_id, source_window_index")]:::unchanged
        ENT[("worldmm_entities\nid, user_id, surface_form,\ncanonical_name, embedding_json")]:::unchanged
        TRP[("worldmm_triples\nid, segment_id, user_id,\nmemory_type, subject, predicate,\nobject, invalidated_at")]:::unchanged
        EMB[("worldmm_visual_embeddings\nid, segment_id, user_id,\ntimestamp, embedding_json")]:::unchanged
        ORM["worldmm_orm.py\nshared/orm/worldmm_orm.py"]:::unchanged
    end

    subgraph S3["S3"]
        FRAMES[("Frame images\nsessions/session_id/window_NNN/frame_000.jpg")]:::unchanged
    end

    subgraph FEED_API["Feed Lambda"]
        FEED["feed app.py\napi/memories/feed/app.py"]:::unchanged
    end

    subgraph APP["Mobile App"]
        HOOK["useMemoriesFeed\nlib/api/memory/useMemoryApi.ts"]:::unchanged
        LIST["listMemories\nlib/api/memory/listMemories.ts"]:::unchanged
        SCREEN["index.tsx\napp/app/index.tsx"]:::unchanged
        MFEED["MemoryFeed\ncomponents/memory/MemoryFeed.tsx"]:::unchanged
        MCARD["MemoryCard\ncomponents/memory/MemoryCard.tsx"]:::unchanged
        MROW["MemoryRow\ncomponents/memory/MemoryRow.tsx"]:::unchanged
        MODAL["MemoryViewerModal\ncomponents/memory/memory-viewer-modal.tsx"]:::unchanged
    end

    GPU -->|"caption, NER entities, triples, 1536-dim visual embedding"| INGEST
    INGEST -->|"create_segment, create_entity, create_triple"| ORM
    INGEST -->|"store_visual_embedding"| ORM
    ORM -->|"SQL INSERT"| SEG
    ORM -->|"SQL INSERT"| ENT
    ORM -->|"SQL INSERT"| TRP
    ORM -->|"SQL INSERT"| EMB
    INGEST -->|"frame_000.jpg at s3_frames_key"| FRAMES

    SCREEN -->|"infinite query"| HOOK
    HOOK -->|"cursor + limit"| LIST
    LIST -->|"POST /memories/feed"| FEED
    FEED -->|"SELECT worldmm_segments ORDER BY start_time DESC LIMIT n+1"| ORM
    ORM -->|"segment rows"| FEED
    FEED -->|"presign s3_frames_key"| S3
    S3 -->|"presigned URL string"| FEED
    FEED -->|"MemoryFeedItem[] + next_cursor"| LIST
    LIST -->|"ListMemoriesResponse"| HOOK
    HOOK -->|"flattened pages of MemoryFeedItem[]"| SCREEN
    SCREEN -->|"memories + pagination callbacks"| MFEED
    MFEED -->|"type=visual MemoryFeedItem"| MCARD
    MFEED -->|"type=audio or text MemoryFeedItem"| MROW
    MCARD -->|"onPress index"| SCREEN
    MROW -->|"onPress index"| SCREEN
    SCREEN -->|"memories + initialIndex"| MODAL

classDef unchanged fill:#d3d3d3,stroke:#666,stroke-width:1px
classDef updated fill:#ffe58a,stroke:#666,stroke-width:1px
classDef deleted fill:#f4a6a6,stroke:#666,stroke-width:1px
classDef created fill:#a8e6a3,stroke:#666,stroke-width:1px

2. Black-Box Inputs and Outputs

Global Types

MemoryFeedItem {
  id:        string                        (UUID — worldmm_segments PK)
  time:      string                        (ISO 8601 — segment start_time)
  type:      "text" | "audio" | "visual"   (NEW — derived from segment content; see type derivation rules below)
  thumbnail?: string                       (CHANGED from url — presigned S3 URL, 1hr TTL; omitted if no frame stored)
  featured:  boolean                       (always false currently)
}

ListMemoriesResponse {
  memories:    MemoryFeedItem[]
  next_cursor: string | null    (string offset; null = no more pages)
}

Type derivation rules (server-side, no new DB columns needed):

condition type value
s3_frames_key is non-null "visual"
s3_frames_key is null and transcript is non-null "audio"
s3_frames_key is null and transcript is null "text"

Pipeline Output: what the ingest pipeline writes

The pipeline is a black box to the feed system. Its only observable output is rows in these four tables:

worldmm_segments — one row per 30-second window

column type notes
id string UUID PK
user_id string owner
start_time string ISO 8601 window start
end_time string ISO 8601 window end
duration_seconds int always 30
caption text | null GPU-generated description of the window
s3_frames_key string | null S3 path to representative frame (sessions/{session_id}/window_{index:03d}/frame_000.jpg)
transcript text | null speech-to-text from window audio
source_session_id string | null originating session ID
source_window_index int | null 0-based index within session

Unique constraint: (user_id, source_session_id, source_window_index) — prevents duplicate ingestion.

worldmm_entities — knowledge graph nodes extracted from captions

column type notes
id string UUID PK
user_id string owner
surface_form string raw text from caption (e.g. "John")
canonical_name string | null normalized form (e.g. "John Smith")
embedding_json text | null 1536-dim vector as JSON

worldmm_triples — episodic and semantic facts

column type notes
id string UUID PK
segment_id string UUID FK links to worldmm_segments
user_id string owner
memory_type string "episodic" or "semantic"
subject_entity_id string UUID FK subject entity
predicate string relation (e.g. "is_doing", "is_at")
object_entity_id string UUID FK | null object as entity (mutually exclusive with literal)
object_literal string | null object as literal value (mutually exclusive with entity)
invalidated_at string ISO 8601 | null soft-delete; NULL = active

worldmm_visual_embeddings — per-window semantic search index

column type notes
id string UUID PK
segment_id string UUID FK links to worldmm_segments
user_id string owner
timestamp string ISO 8601 frame capture time
embedding_json text | null 1536-dim vector as JSON (pgvector on PostgreSQL)

Flow: listMemories — paginated memory feed

  • Test files: main/server/tests/integration/test_memories_feed_pagination.py

Request (POST /memories/feed)

{
  cursor?: string | null  (string integer offset; null or omit for first page)
  limit?:  number         (default 20, max 50)
}

Response

{
  memories:    MemoryFeedItem[]
  next_cursor: string | null
}

path-name input output path-type updated
feed.first-page valid JWT, no cursor first limit items newest-first, next_cursor set if more exist happy path
feed.paginated valid JWT + cursor next page of items, next_cursor null on last page happy path
feed.empty valid JWT, no segments { memories: [], next_cursor: null } subpath
feed.no-thumbnail segment has no s3_frames_key item returned without thumbnail field, type = "audio" or "text" subpath
feed.type-visual segment has s3_frames_key non-null type: "visual", thumbnail present subpath
feed.type-audio no s3_frames_key, transcript non-null type: "audio", no thumbnail subpath
feed.type-text no s3_frames_key, no transcript type: "text", no thumbnail subpath
feed.unauthenticated no JWT 401 error
feed.user-isolation valid JWT user A never returns segments owned by user B security

Ordering: DESC(start_time), DESC(id) — newest segments first. Ties broken by segment UUID. Pagination: server fetches limit + 1 rows; if count exceeds limit, sets next_cursor = str(cursor + limit). Presigned URLs: generated per item via boto3 at response time, TTL = 3600s. Null s3_frames_keythumbnail field omitted entirely.


3. Technical Details

Display: how MemoryFeedItem[] becomes UI

index.tsx
  └─ useMemoriesFeed({ limit: 20 })        ← React Query infinite query
       └─ listMemories({ cursor, limit })   ← POST /memories/feed
  └─ sortedMemories                         ← client-side sort newest-first over all pages
  └─ <MemoryFeed memories={sortedMemories}> ← mixed layout FlatList
       └─ type = "visual"  → <MemoryCard>  ← tile in 2-column grid, renders memory.thumbnail
       └─ type = "audio"   → <MemoryRow>   ← full-width row, shows transcript/audio indicator
       └─ type = "text"    → <MemoryRow>   ← full-width row, shows caption text
            onPress(index) → setViewerIndex(index)
  └─ <MemoryViewerModal
         memories={sortedMemories}
         initialIndex={viewerIndex}>        ← horizontal paged FlatList, full-screen
         image source = memory.thumbnail
         fallback = placeholder if thumbnail absent (type = "text" or "audio")

MemoryFeed layout rules: - FlatList with mixed layout — render mode determined by type field - type: "visual"<MemoryCard> tile in 2-column grid (same as before) - type: "audio"<MemoryRow> full-width row showing transcript or audio indicator - type: "text"<MemoryRow> full-width row showing caption text - featured: true visual items span full width (currently unused) - Infinite scroll: onEndReachedThreshold={0.4} triggers fetchNextPage() - Pull-to-refresh: calls refetch()

MemoryViewerModal: - Horizontal FlatList with pagingEnabled - Full-screen image from memory.thumbnail (absent for text/audio type — show placeholder) - Close button with safe-area inset

Security Invariants

  • user_id always sourced from JWT — never trusted from request body
  • Feed query always filters by user_id — no cross-user data leakage
  • Presigned URLs are time-scoped (1hr) and generated server-side per request