Chat Memory Search

Plan Metadata

Plan type: plan
Parent plan: N/A
Depends on: N/A
Status: documentation

System Intent

What is being built: The end-to-end flow that answers a user's chat question by searching three types of memory (episodic, semantic, visual) using a multi-round reasoning agent and a GPU worker. Includes persistent chat threads stored in DynamoDB, a chat history list in the hamburger menu, and paginated message loading.
Primary consumer(s): Chat screen in the mobile app (chat.tsx), side menu (side-menu.tsx)
Boundary: User submits a natural-language question → system retrieves relevant memories across three retrieval backends → GPU LLM synthesizes an answer → answer and messages stored in DynamoDB → answer returned to the app. Users can browse past chats from the hamburger menu and load previous messages with pagination.

Stage Gate Tracker

[x] Stage 1 Mermaid approved
[x] Stage 2 I/O contracts approved
[x] Stage 3 pseudocode/technical details approved

1. Mermaid Diagram

flowchart TD
    subgraph APP["App"]
        CHAT["chat.tsx\napp/app/chat.tsx"]:::done
        SIDE_MENU["SideMenu\ncomponents/side-menu.tsx"]:::unchanged
        CWM["chatWithMemory.ts\nlib/api/memory/chatWithMemory.ts"]:::done
        CWMV["chatWithMemoryVerbose.ts\nlib/api/memory/chatWithMemoryVerbose.ts"]:::done
        FETCH_CHATS["fetchChats.ts\nlib/api/chats/fetchChats.ts"]:::unchanged
        USE_CHATS["useChatsApi.ts\nlib/api/chats/useChatsApi.ts"]:::done
        GET_MSGS["getChatMessages.ts\nlib/api/chats/getChatMessages.ts"]:::done
    end

    subgraph LAMBDA["Backend Lambda"]
        CHAT_LAMBDA["chat — app.py\napi/memories/chat/app.py"]:::done
        AGENT["ReasoningAgent\nretrieval/agent.py"]:::unchanged
        DB_LOADER["db_loader.py\nretrieval/db_loader.py"]:::unchanged
        CHATS_LAMBDA["chats_list — app.py\napi/chats/list/app.py"]:::done
        MSGS_LAMBDA["chat_messages — app.py\napi/chats/messages/app.py"]:::done
    end

    subgraph CHAT_STORE["Chat Store"]
        CHAT_REPO["ChatRepository\nchat/repository.py"]:::done
        DDB[("DynamoDB\nMessages: PK=chat_id SK=ulid\nChats: PK=user_id SK=chat_id")]:::unchanged
    end

    subgraph RETRIEVERS["Retrievers"]
        EP["episodic_retriever.py\nretrieval/episodic_retriever.py"]:::unchanged
        SEM["semantic_retriever.py\nretrieval/semantic_retriever.py"]:::unchanged
        VIS["visual_retriever.py\nretrieval/visual_retriever.py"]:::unchanged
    end

    subgraph EMBEDDINGS["Embedders and LLM"]
        EMB["TextEmbedder\nmemory/text_embedder.py"]:::unchanged
        VLM["VLM2VecClient\nmemory/visual/encoder.py"]:::unchanged
        LLM["GPULLMClient\nllm/client.py"]:::unchanged
    end

    subgraph PGDB["PostgreSQL + pgvector"]
        ORM["worldmm_orm.py\nshared/orm/worldmm_orm.py"]:::unchanged
        PG[("PostgreSQL + pgvector")]:::unchanged
    end

    subgraph GPU["GPU Worker EC2"]
        GPU_SRV["FastAPI server\ngpu_worker/server.py"]:::unchanged
    end

    SIDE_MENU -->|"chat list + pagination"| USE_CHATS
    SIDE_MENU -->|"selected chatId"| CHAT
    USE_CHATS -->|"useChatsFeed paginated fetch"| FETCH_CHATS
    USE_CHATS -->|"useChatMessages paginated fetch"| GET_MSGS
    FETCH_CHATS -->|"POST /chats/list"| CHATS_LAMBDA

    CHAT -->|"useChatWithMemory mutation"| USE_CHATS
    CHAT -->|"paginated messages"| USE_CHATS

    USE_CHATS -->|"useChatWithMemory"| CWM
    USE_CHATS -->|"useChatWithMemory verbose"| CWMV

    GET_MSGS -->|"GET /chats/messages?chat_id and cursor"| MSGS_LAMBDA

    CWM -->|"POST /memories/chat"| CHAT_LAMBDA
    CWMV -->|"POST /memories/chat/verbose"| CHAT_LAMBDA

    CHAT_LAMBDA -->|"load graphs"| DB_LOADER
    CHAT_LAMBDA -->|"question + context"| AGENT
    CHAT_LAMBDA -->|"user and assistant messages"| CHAT_REPO

    CHATS_LAMBDA -->|"user_id + cursor"| CHAT_REPO
    MSGS_LAMBDA -->|"chat_id + cursor"| CHAT_REPO
    CHAT_REPO -->|"DynamoDB queries"| DDB

    DB_LOADER -->|"ORM queries"| ORM
    ORM -->|"SQL"| PG

    AGENT -->|"search query"| EP
    AGENT -->|"search query"| SEM
    AGENT -->|"search query"| VIS
    AGENT -->|"question + memory"| LLM

    EP -->|"text query"| EMB
    SEM -->|"text query"| EMB
    VIS -->|"video query"| VLM
    EMB -->|"text"| VLM
    LLM -->|"generate prompt"| VLM
    VLM -->|"HTTP encode-text encode-video generate"| GPU_SRV

    SEM -->|"pgvector search"| ORM
    VIS -->|"pgvector search"| ORM

classDef unchanged fill:#d3d3d3,stroke:#666,stroke-width:1px
classDef updated fill:#ffe58a,stroke:#666,stroke-width:1px
classDef deleted fill:#f4a6a6,stroke:#666,stroke-width:1px
classDef created fill:#a8e6a3,stroke:#666,stroke-width:1px
classDef done fill:#b0b0b0,stroke:#666,stroke-width:1px

2. Black-Box Inputs and Outputs

Global Types

ChatMessage {
  message_id: string  (ULID — sort key, encodes timestamp)
  chat_id:    string
  role:       "user" | "assistant"
  content:    string
  created_at: string  (ISO 8601, for display)
}

ChatSummary {
  id:           string  (chat_id)
  title:        string  (first user message, truncated to 64 chars)
  last_message: string
  updated_at:   string (ISO 8601)
}

Flow: `chatWithMemory` (updated)

Test files: main/app/__tests__/chat-with-memory.test.ts

Request (POST /memories/chat)

{
  question: string   (required, non-empty)
  chat_id?: string   (optional UUID — omit to start a new chat)
}

Response

{ answer: string, chat_id: string }

path-name	input	output	path-type
`chat.new`	valid question, no `chat_id`	`{ answer, chat_id }`, messages saved to DynamoDB	happy path
`chat.existing`	valid question + owned `chat_id`	`{ answer, chat_id }`, messages saved to DynamoDB	happy path
`chat.unauthorized`	valid question + unowned `chat_id`	403	error
`chat.gpu-starting`	any valid question	`{ answer: "Chat is not available…", chat_id: null }`, no DB write	subpath
`chat.missing-question`	`question=""`	400	error
`chat.unauthenticated`	no JWT	401	error

Side effect: on success, lambda saves two ChatMessage rows to DynamoDB — one role=user, one role=assistant — and upserts the Chats table entry for ownership.

Flow: `chatWithMemoryVerbose` (updated)

Test files: main/app/__tests__/chat-with-memory-verbose.test.ts

Request (POST /memories/chat/verbose)

{
  question: string   (required, non-empty)
  chat_id?: string   (optional UUID — omit to start a new chat)
}

Response

{ answer: string, chat_id: string, trace: Trace }

path-name	input	output	path-type
`chatVerbose.new`	valid question, no `chat_id`	`{ answer, chat_id, trace }`, messages saved to DynamoDB	happy path
`chatVerbose.existing`	valid question + owned `chat_id`	`{ answer, chat_id, trace }`, messages saved to DynamoDB	happy path
`chatVerbose.unauthorized`	valid question + unowned `chat_id`	403	error

Flow: `listChats` (new backend — frontend already exists)

Test files: main/server/tests/unit/test_chats_list.py

Request (POST /chats/list)

{
  cursor?: string | null   (DynamoDB LastEvaluatedKey, base64 encoded)
  limit?:  number          (default 20, max 50)
}

Response

{
  conversations: ChatSummary[]
  next_cursor:   string | null
}

path-name	input	output	path-type
`listChats.success`	valid JWT	paginated `ChatSummary[]` for authenticated user	happy path
`listChats.empty`	valid JWT, no chats	`{ conversations: [], next_cursor: null }`	subpath
`listChats.unauthenticated`	no JWT	401	error

Security: user_id from JWT is the only filter — users can only ever see their own chats.

Flow: `getChatMessages` (new)

Test files: main/server/tests/unit/test_chat_messages.py

Request (GET /chats/messages?chat_id=&cursor=&limit=)

{
  chat_id: string  (required)
  cursor?: string | null  (ULID of last fetched message — ExclusiveStartKey)
  limit?:  number  (default 20, max 50)
}

Response

{
  messages:    ChatMessage[]  (newest-first)
  next_cursor: string | null
}

path-name	input	output	path-type
`getMessages.success`	valid JWT + owned chat_id	paginated `ChatMessage[]` newest-first	happy path
`getMessages.unauthorized`	valid JWT + chat_id not owned by user	403	error
`getMessages.not-found`	valid JWT + unknown chat_id	403 (same as unauthorized — don't leak existence)	error
`getMessages.unauthenticated`	no JWT	401	error

Security: Lambda checks Chats table for (user_id, chat_id) pair before reading Messages. Returns 403 (not 404) to avoid leaking whether a chat exists.

3. Technical Details

Security Invariants

user_id always sourced from JWT auth context — never trusted from request body
message_id (ULID) always generated server-side — never accepted from client
Ownership check required on both read and write when chat_id is provided
All chat/message endpoints return 403 (never 404) for unowned or unknown chat_id — do not leak existence
Chats table acts as the ownership registry — (user_id, chat_id) must exist before Messages table is touched

Pseudocode: `chat/app.py` — handle_chat()

handle_chat(user_id, question, chat_id=None):
    is_new_chat = chat_id is None

    if not is_new_chat:
        # ownership check — 403 if not found
        if not repo.chat_owned_by(user_id, chat_id):
            raise Forbidden()

    # run reasoning agent (unchanged)
    answer, trace = agent.answer(question)

    # persist — repo handles ULID generation + dual-table write
    if is_new_chat:
        chat_id = repo.create_chat(user_id)
    repo.save_message(chat_id, role="user",      content=question)
    repo.save_message(chat_id, role="assistant", content=answer)

    return {"answer": answer, "chat_id": chat_id}

Pseudocode: `ChatRepository`

create_chat(user_id) -> chat_id:
    chat_id = uuid4()
    chats_table.put_item({
        PK: user_id,
        SK: chat_id,
        last_activity: now_iso(),
    })
    return chat_id

chat_owned_by(user_id, chat_id) -> bool:
    item = chats_table.get_item(PK=user_id, SK=chat_id)
    return item is not None

save_message(chat_id, role, content):
    messages_table.put_item({
        PK: chat_id,
        SK: ulid(),          # monotonically increasing, encodes timestamp
        role: role,
        content: content,
        created_at: now_iso(),
    })

get_messages(user_id, chat_id, cursor=None, limit=20) -> (messages, next_cursor):
    if not chat_owned_by(user_id, chat_id):
        raise Forbidden()
    result = messages_table.query(
        PK=chat_id,
        ScanIndexForward=False,   # newest first
        Limit=limit,
        ExclusiveStartKey=cursor,
    )
    return result.items, result.LastEvaluatedKey

list_chats(user_id, cursor=None, limit=20) -> (chats, next_cursor):
    result = chats_table.query(
        PK=user_id,
        ScanIndexForward=False,
        Limit=limit,
        ExclusiveStartKey=cursor,
    )
    return result.items, result.LastEvaluatedKey

Implementation Checklist

Frontend - [x] app/app/chat.tsx — add chatId state (null on new chat, set from first response or from route params on existing), pass to useChatsApi - [x] lib/api/memory/chatWithMemory.ts — add optional chat_id to request body; handle chat_id in response for new chats - [x] lib/api/memory/chatWithMemoryVerbose.ts — add optional chat_id to request body; handle chat_id in response for new chats - [x] lib/api/chats/useChatsApi.ts — add useChatWithMemory mutation + useChatMessages infinite query - [x] lib/api/chats/getChatMessages.ts — new fetch function for GET /chats/messages

Backend - [x] api/memories/chat/app.py — accept chat_id, save user + assistant messages to DynamoDB via ChatRepository - [x] api/memories/chat_verbose/app.py — accept chat_id, same DynamoDB save - [x] api/chats/list/app.py — new lambda, list chats for authenticated user - [x] api/chats/messages/app.py — new lambda, get messages with ownership check + pagination - [x] chat/repository.py — new ChatRepository wrapping boto3 DynamoDB calls

Infrastructure - [ ] DynamoDB Messages table — PK: chat_id, SK: ulid - [ ] DynamoDB Chats table — PK: user_id, SK: chat_id - [ ] Wire new lambdas into API Gateway + Terraform

Chat Memory Search

Plan Metadata

System Intent

Stage Gate Tracker

1. Mermaid Diagram

2. Black-Box Inputs and Outputs

Global Types

Flow: chatWithMemory (updated)

Flow: chatWithMemoryVerbose (updated)

Flow: listChats (new backend — frontend already exists)

Flow: getChatMessages (new)

3. Technical Details

Security Invariants

Pseudocode: chat/app.py — handle_chat()

Pseudocode: ChatRepository

Implementation Checklist

Flow: `chatWithMemory` (updated)

Flow: `chatWithMemoryVerbose` (updated)

Flow: `listChats` (new backend — frontend already exists)

Flow: `getChatMessages` (new)

Pseudocode: `chat/app.py` — handle_chat()

Pseudocode: `ChatRepository`