Skip to content

Text-Only Memory Uploads

Metadata

  • System type: flow

System Intent

  • What this is: An investigation of whether the system supports uploading a memory (note, text snippet, or document) without any accompanying audio or video.

Summary Verdict

Not supported end-to-end. The frontend client library has full support for a text upload field in its multipart-upload flow, but the backend Lambda handlers for the /memories/uploads API endpoints do not exist in the deployed server stack. The existing ingest pipeline (IngestWindowFunction) is session-based and requires either audio or video frames as input; there is no code path that accepts a raw text payload as the primary memory source.

Mermaid Diagram

flowchart TD
  FE["Frontend: createNewMemory()\nmain/app/lib/api/memory/createNewMemory.ts"]
  FE -->|POST /memories/uploads| Missing["MISSING: Backend Lambda\napi/memories/uploads/ has no app.py"]
  FE2["Existing capture path\n(audio/video sessions)"]
  FE2 -->|POST /sessions/start| SessionStart["SessionStartFunction\napi/sessions/start/app.py"]
  SessionStart --> Frames["FramePostFunction"]
  SessionStart --> Audio["AudioPostFunction"]
  Frames --> Ingest["IngestWindowFunction\nworldmm/pipeline/ingest_window.py"]
  Audio --> AudioEvent["AudioUploadCompleteFunction\nevents/audio_upload_complete/app.py"]
  AudioEvent --> Ingest
  Ingest -->|audio-only path| DB["PostgreSQL: worldmm_segments"]
  Ingest -->|video path| GPU["GPU Worker (caption + embeddings)"]
  GPU --> DB

Flows

Flow: createNewMemory (frontend client — partial implementation)

  • Core files:
  • main/app/lib/api/memory/createNewMemory.ts
  • main/app/lib/api/memory/createNewMemoryUploadFlow.ts
  • main/app/lib/api/memory/raw-fields.ts
  • main/app/lib/api/memory/startMemoryUpload.ts
  • main/app/lib/api/memory/completeMemoryUpload.ts
  • main/app/lib/api/memory/uploadMemoryPart.ts
  • Test files: main/app/__tests__/create-new-memory.test.ts, main/app/__tests__/memory-api-mpu.test.ts

Types

RawField = "text" | "photos" | "audio" | "video"
  (main/app/lib/api/memory/raw-fields.ts:3 — derived from RAW_FIELDS constant at :1)

UploadItemInput {
  content_type: string (required)
  size_bytes: number (required)
  body: Blob (required)
}

CreateNewMemoryPayload {
  user_id: string (required)
  uploads: Partial<Record<RawField, UploadItemInput[]>>
    -- at least one field must be non-empty
    -- "text" uploads are explicitly accepted
  part_size_bytes?: number (default 5 MB)
  signal?: AbortSignal
}

CreateNewMemoryResponse {
  memory_id: string
}

Paths

path input output path-type notes
createNewMemory.text-only { uploads: { text: [...] } } frontend client error / 404 error Backend endpoints do not exist
createNewMemory.mixed { uploads: { text: [...], photos: [...] } } frontend client error / 404 error Same — no backend Lambda for /memories/uploads

Pseudocode

createNewMemory(payload):
  validate: at least one uploads field has items
  POST /memories/uploads → start session, get upload_sessions[]
  for each item in uploads[field]:
    POST /memories/uploads/{upload_id}/parts → get presigned URL
    PUT presigned_url ← binary blob
  POST /memories/uploads/{upload_id}/complete
  return memory_id

The frontend zod schema (UploadsSchema in createNewMemory.ts:38-43) explicitly marks text as optional alongside photos, audio, and video. The validation at line 52-57 requires at least one field to be present but does not require audio or video.

The text field flows through the same multipart-upload machinery as audio and video. No special text-handling logic exists in the frontend; it treats text blobs the same as any other upload item.

Flow: IngestWindowFunction (existing session-based ingest)

  • Core files: main/server/worldmm/pipeline/ingest_window.py
  • Template: main/server/template.yaml:402

This is the only deployed Lambda that writes memory segments to the DB. It requires:

Input field Required? Notes
sessionId yes S3 prefix for frames/audio
userId yes
windowIndex yes 0-indexed 30-second window
frameCount yes 0 = audio-only path

There is an audio-only path (ingest_window.py:638-655): when no frames exist in S3, the function skips GPU enrichment, transcribes audio, generates a title, marks the segment complete, and returns without needing a GPU. (Lines 563-572 are a separate guard that skips processing when frame_count > 0 but frames are missing from S3.) This path handles phone audio-only sessions but still requires audio data uploaded to S3 first.

There is no text-only path in IngestWindowFunction. If frame_count == 0 and no audio is in S3, the segment is created with processing_status=pending and then immediately marked complete with no content (no transcript, no caption, no title).

Missing Backend Implementation

The following API routes are called by the frontend but have no corresponding Lambda handler in template.yaml and no app.py in the server tree:

Route Frontend caller Backend status
POST /memories/uploads startMemoryUpload (startMemoryUpload.ts:57) Not implemented — api/memories/uploads/start/ is an empty directory (only __pycache__)
POST /memories/uploads/{upload_id}/parts uploadMemoryPart (uploadMemoryPart.ts) Not implemented — api/memories/uploads/parts/ is empty
POST /memories/uploads/{upload_id}/complete completeMemoryUpload (completeMemoryUpload.ts:94) Not implemented — api/memories/uploads/complete/ is empty

The shared layer defines RAW_UPLOAD_FIELDS = ("text", "photos", "audio", "video") at main/server/layers/shared/python/shared/memory_uploads.py:3, confirming the backend is aware of these field names but has no handler that uses them.

The useCreateNewMemory hook (main/app/lib/api/memory/useMemoryApi.ts:6) is exported but never called from any screen component — no UI surface currently invokes the text upload flow.

What Would Be Needed to Add Support

  1. Backend Lambda handlers for the three /memories/uploads routes (start, parts, complete). These need to:
  2. start: allocate a memory record and S3 multipart upload sessions for each requested raw_field; return memory_id + upload_sessions[]
  3. parts: generate a presigned URL for one S3 multipart part
  4. complete: finalize S3 multipart uploads, trigger ingestion

  5. A text-only ingest path in IngestWindowFunction (or a new Lambda). The existing function has no branch for a memory whose only content is a text blob stored in S3. A new path would need to:

  6. Read the text blob from S3
  7. Skip audio transcription and GPU captioning
  8. Store the raw text as transcript (or a new field) on the segment
  9. Run entity/triple extraction from the text directly (currently only done from GPU-generated captions)

  10. DB schema: worldmm_segments already has a transcript column (Text, nullable) which could hold a text-only note. No schema change is strictly required for a minimal implementation, but a source_type discriminator column would help distinguish session recordings from manual text notes.

  11. UI surface: useCreateNewMemory needs to be wired to a screen component. Currently it is exported but unused.

Logs

Source Location
IngestWindowFunction CloudWatch: /aws/lambda/{stack}-IngestWindowFunction
AudioUploadCompleteFunction CloudWatch: /aws/lambda/{stack}-AudioUploadCompleteFunction
Frontend upload flow createFlowLogger("upload-memory") — client-side console/telemetry

Deployment

  • Mechanism: SAM
  • Deploy command:
    sam build && sam deploy --template main/server/template.yaml
    
  • Notes: The /memories/uploads endpoints are not registered in template.yaml and are not deployed. Any new handlers must be added both as Lambda functions in template.yaml and as app.py files under main/server/api/memories/uploads/.