Text-Only Memory Uploads

Metadata

System type: flow

System Intent

What this is: An investigation of whether the system supports uploading a memory (note, text snippet, or document) without any accompanying audio or video.

Summary Verdict

Not supported end-to-end. The frontend client library has full support for a text upload field in its multipart-upload flow, but the backend Lambda handlers for the /memories/uploads API endpoints do not exist in the deployed server stack. The existing ingest pipeline (IngestWindowFunction) is session-based and requires either audio or video frames as input; there is no code path that accepts a raw text payload as the primary memory source.

Mermaid Diagram

flowchart TD
  FE["Frontend: createNewMemory()\nmain/app/lib/api/memory/createNewMemory.ts"]
  FE -->|POST /memories/uploads| Missing["MISSING: Backend Lambda\napi/memories/uploads/ has no app.py"]
  FE2["Existing capture path\n(audio/video sessions)"]
  FE2 -->|POST /sessions/start| SessionStart["SessionStartFunction\napi/sessions/start/app.py"]
  SessionStart --> Frames["FramePostFunction"]
  SessionStart --> Audio["AudioPostFunction"]
  Frames --> Ingest["IngestWindowFunction\nworldmm/pipeline/ingest_window.py"]
  Audio --> AudioEvent["AudioUploadCompleteFunction\nevents/audio_upload_complete/app.py"]
  AudioEvent --> Ingest
  Ingest -->|audio-only path| DB["PostgreSQL: worldmm_segments"]
  Ingest -->|video path| GPU["GPU Worker (caption + embeddings)"]
  GPU --> DB

Flows

Flow: `createNewMemory` (frontend client — partial implementation)

Core files:
main/app/lib/api/memory/createNewMemory.ts
main/app/lib/api/memory/createNewMemoryUploadFlow.ts
main/app/lib/api/memory/raw-fields.ts
main/app/lib/api/memory/startMemoryUpload.ts
main/app/lib/api/memory/completeMemoryUpload.ts
main/app/lib/api/memory/uploadMemoryPart.ts
Test files: main/app/__tests__/create-new-memory.test.ts, main/app/__tests__/memory-api-mpu.test.ts

Types

RawField = "text" | "photos" | "audio" | "video"
  (main/app/lib/api/memory/raw-fields.ts:3 — derived from RAW_FIELDS constant at :1)

UploadItemInput {
  content_type: string (required)
  size_bytes: number (required)
  body: Blob (required)
}

CreateNewMemoryPayload {
  user_id: string (required)
  uploads: Partial<Record<RawField, UploadItemInput[]>>
    -- at least one field must be non-empty
    -- "text" uploads are explicitly accepted
  part_size_bytes?: number (default 5 MB)
  signal?: AbortSignal
}

CreateNewMemoryResponse {
  memory_id: string
}

Paths

path	input	output	path-type	notes
`createNewMemory.text-only`	`{ uploads: { text: [...] } }`	frontend client error / 404	error	Backend endpoints do not exist
`createNewMemory.mixed`	`{ uploads: { text: [...], photos: [...] } }`	frontend client error / 404	error	Same — no backend Lambda for `/memories/uploads`

Pseudocode

createNewMemory(payload):
  validate: at least one uploads field has items
  POST /memories/uploads → start session, get upload_sessions[]
  for each item in uploads[field]:
    POST /memories/uploads/{upload_id}/parts → get presigned URL
    PUT presigned_url ← binary blob
  POST /memories/uploads/{upload_id}/complete
  return memory_id

The frontend zod schema (UploadsSchema in createNewMemory.ts:38-43) explicitly marks text as optional alongside photos, audio, and video. The validation at line 52-57 requires at least one field to be present but does not require audio or video.

The text field flows through the same multipart-upload machinery as audio and video. No special text-handling logic exists in the frontend; it treats text blobs the same as any other upload item.

Flow: `IngestWindowFunction` (existing session-based ingest)

Core files: main/server/worldmm/pipeline/ingest_window.py
Template: main/server/template.yaml:402

This is the only deployed Lambda that writes memory segments to the DB. It requires:

Input field	Required?	Notes
`sessionId`	yes	S3 prefix for frames/audio
`userId`	yes
`windowIndex`	yes	0-indexed 30-second window
`frameCount`	yes	0 = audio-only path

There is an audio-only path (ingest_window.py:638-655): when no frames exist in S3, the function skips GPU enrichment, transcribes audio, generates a title, marks the segment complete, and returns without needing a GPU. (Lines 563-572 are a separate guard that skips processing when frame_count > 0 but frames are missing from S3.) This path handles phone audio-only sessions but still requires audio data uploaded to S3 first.

There is no text-only path in IngestWindowFunction. If frame_count == 0 and no audio is in S3, the segment is created with processing_status=pending and then immediately marked complete with no content (no transcript, no caption, no title).

Missing Backend Implementation

The following API routes are called by the frontend but have no corresponding Lambda handler in template.yaml and no app.py in the server tree:

Route	Frontend caller	Backend status
`POST /memories/uploads`	`startMemoryUpload` (`startMemoryUpload.ts:57`)	Not implemented — `api/memories/uploads/start/` is an empty directory (only `__pycache__`)
`POST /memories/uploads/{upload_id}/parts`	`uploadMemoryPart` (`uploadMemoryPart.ts`)	Not implemented — `api/memories/uploads/parts/` is empty
`POST /memories/uploads/{upload_id}/complete`	`completeMemoryUpload` (`completeMemoryUpload.ts:94`)	Not implemented — `api/memories/uploads/complete/` is empty

The shared layer defines RAW_UPLOAD_FIELDS = ("text", "photos", "audio", "video") at main/server/layers/shared/python/shared/memory_uploads.py:3, confirming the backend is aware of these field names but has no handler that uses them.

The useCreateNewMemory hook (main/app/lib/api/memory/useMemoryApi.ts:6) is exported but never called from any screen component — no UI surface currently invokes the text upload flow.

What Would Be Needed to Add Support

Backend Lambda handlers for the three /memories/uploads routes (start, parts, complete). These need to:
start: allocate a memory record and S3 multipart upload sessions for each requested raw_field; return memory_id + upload_sessions[]
parts: generate a presigned URL for one S3 multipart part
complete: finalize S3 multipart uploads, trigger ingestion
A text-only ingest path in IngestWindowFunction (or a new Lambda). The existing function has no branch for a memory whose only content is a text blob stored in S3. A new path would need to:
Read the text blob from S3
Skip audio transcription and GPU captioning
Store the raw text as transcript (or a new field) on the segment
Run entity/triple extraction from the text directly (currently only done from GPU-generated captions)
DB schema: worldmm_segments already has a transcript column (Text, nullable) which could hold a text-only note. No schema change is strictly required for a minimal implementation, but a source_type discriminator column would help distinguish session recordings from manual text notes.
UI surface: useCreateNewMemory needs to be wired to a screen component. Currently it is exported but unused.

Logs

Source	Location
IngestWindowFunction	CloudWatch: `/aws/lambda/{stack}-IngestWindowFunction`
AudioUploadCompleteFunction	CloudWatch: `/aws/lambda/{stack}-AudioUploadCompleteFunction`
Frontend upload flow	`createFlowLogger("upload-memory")` — client-side console/telemetry

Deployment

Mechanism: SAM

Deploy command:

sam build && sam deploy --template main/server/template.yaml

Notes: The /memories/uploads endpoints are not registered in template.yaml and are not deployed. Any new handlers must be added both as Lambda functions in template.yaml and as app.py files under main/server/api/memories/uploads/.