Text-Only Memory Uploads
Metadata
- System type:
flow
System Intent
- What this is: An investigation of whether the system supports uploading a memory (note, text snippet, or document) without any accompanying audio or video.
Summary Verdict
Not supported end-to-end. The frontend client library has full support for a text upload field in its multipart-upload flow, but the backend Lambda handlers for the /memories/uploads API endpoints do not exist in the deployed server stack. The existing ingest pipeline (IngestWindowFunction) is session-based and requires either audio or video frames as input; there is no code path that accepts a raw text payload as the primary memory source.
Mermaid Diagram
flowchart TD
FE["Frontend: createNewMemory()\nmain/app/lib/api/memory/createNewMemory.ts"]
FE -->|POST /memories/uploads| Missing["MISSING: Backend Lambda\napi/memories/uploads/ has no app.py"]
FE2["Existing capture path\n(audio/video sessions)"]
FE2 -->|POST /sessions/start| SessionStart["SessionStartFunction\napi/sessions/start/app.py"]
SessionStart --> Frames["FramePostFunction"]
SessionStart --> Audio["AudioPostFunction"]
Frames --> Ingest["IngestWindowFunction\nworldmm/pipeline/ingest_window.py"]
Audio --> AudioEvent["AudioUploadCompleteFunction\nevents/audio_upload_complete/app.py"]
AudioEvent --> Ingest
Ingest -->|audio-only path| DB["PostgreSQL: worldmm_segments"]
Ingest -->|video path| GPU["GPU Worker (caption + embeddings)"]
GPU --> DB Flows
Flow: createNewMemory (frontend client — partial implementation)
- Core files:
main/app/lib/api/memory/createNewMemory.tsmain/app/lib/api/memory/createNewMemoryUploadFlow.tsmain/app/lib/api/memory/raw-fields.tsmain/app/lib/api/memory/startMemoryUpload.tsmain/app/lib/api/memory/completeMemoryUpload.tsmain/app/lib/api/memory/uploadMemoryPart.ts- Test files:
main/app/__tests__/create-new-memory.test.ts,main/app/__tests__/memory-api-mpu.test.ts
Types
RawField = "text" | "photos" | "audio" | "video"
(main/app/lib/api/memory/raw-fields.ts:3 — derived from RAW_FIELDS constant at :1)
UploadItemInput {
content_type: string (required)
size_bytes: number (required)
body: Blob (required)
}
CreateNewMemoryPayload {
user_id: string (required)
uploads: Partial<Record<RawField, UploadItemInput[]>>
-- at least one field must be non-empty
-- "text" uploads are explicitly accepted
part_size_bytes?: number (default 5 MB)
signal?: AbortSignal
}
CreateNewMemoryResponse {
memory_id: string
}
Paths
| path | input | output | path-type | notes |
|---|---|---|---|---|
createNewMemory.text-only | { uploads: { text: [...] } } | frontend client error / 404 | error | Backend endpoints do not exist |
createNewMemory.mixed | { uploads: { text: [...], photos: [...] } } | frontend client error / 404 | error | Same — no backend Lambda for /memories/uploads |
Pseudocode
createNewMemory(payload):
validate: at least one uploads field has items
POST /memories/uploads → start session, get upload_sessions[]
for each item in uploads[field]:
POST /memories/uploads/{upload_id}/parts → get presigned URL
PUT presigned_url ← binary blob
POST /memories/uploads/{upload_id}/complete
return memory_id
The frontend zod schema (UploadsSchema in createNewMemory.ts:38-43) explicitly marks text as optional alongside photos, audio, and video. The validation at line 52-57 requires at least one field to be present but does not require audio or video.
The text field flows through the same multipart-upload machinery as audio and video. No special text-handling logic exists in the frontend; it treats text blobs the same as any other upload item.
Flow: IngestWindowFunction (existing session-based ingest)
- Core files:
main/server/worldmm/pipeline/ingest_window.py - Template:
main/server/template.yaml:402
This is the only deployed Lambda that writes memory segments to the DB. It requires:
| Input field | Required? | Notes |
|---|---|---|
sessionId | yes | S3 prefix for frames/audio |
userId | yes | |
windowIndex | yes | 0-indexed 30-second window |
frameCount | yes | 0 = audio-only path |
There is an audio-only path (ingest_window.py:638-655): when no frames exist in S3, the function skips GPU enrichment, transcribes audio, generates a title, marks the segment complete, and returns without needing a GPU. (Lines 563-572 are a separate guard that skips processing when frame_count > 0 but frames are missing from S3.) This path handles phone audio-only sessions but still requires audio data uploaded to S3 first.
There is no text-only path in IngestWindowFunction. If frame_count == 0 and no audio is in S3, the segment is created with processing_status=pending and then immediately marked complete with no content (no transcript, no caption, no title).
Missing Backend Implementation
The following API routes are called by the frontend but have no corresponding Lambda handler in template.yaml and no app.py in the server tree:
| Route | Frontend caller | Backend status |
|---|---|---|
POST /memories/uploads | startMemoryUpload (startMemoryUpload.ts:57) | Not implemented — api/memories/uploads/start/ is an empty directory (only __pycache__) |
POST /memories/uploads/{upload_id}/parts | uploadMemoryPart (uploadMemoryPart.ts) | Not implemented — api/memories/uploads/parts/ is empty |
POST /memories/uploads/{upload_id}/complete | completeMemoryUpload (completeMemoryUpload.ts:94) | Not implemented — api/memories/uploads/complete/ is empty |
The shared layer defines RAW_UPLOAD_FIELDS = ("text", "photos", "audio", "video") at main/server/layers/shared/python/shared/memory_uploads.py:3, confirming the backend is aware of these field names but has no handler that uses them.
The useCreateNewMemory hook (main/app/lib/api/memory/useMemoryApi.ts:6) is exported but never called from any screen component — no UI surface currently invokes the text upload flow.
What Would Be Needed to Add Support
- Backend Lambda handlers for the three
/memories/uploadsroutes (start, parts, complete). These need to: start: allocate a memory record and S3 multipart upload sessions for each requestedraw_field; returnmemory_id+upload_sessions[]parts: generate a presigned URL for one S3 multipart part-
complete: finalize S3 multipart uploads, trigger ingestion -
A text-only ingest path in
IngestWindowFunction(or a new Lambda). The existing function has no branch for a memory whose only content is a text blob stored in S3. A new path would need to: - Read the text blob from S3
- Skip audio transcription and GPU captioning
- Store the raw text as
transcript(or a new field) on the segment -
Run entity/triple extraction from the text directly (currently only done from GPU-generated captions)
-
DB schema:
worldmm_segmentsalready has atranscriptcolumn (Text, nullable) which could hold a text-only note. No schema change is strictly required for a minimal implementation, but asource_typediscriminator column would help distinguish session recordings from manual text notes. -
UI surface:
useCreateNewMemoryneeds to be wired to a screen component. Currently it is exported but unused.
Logs
| Source | Location |
|---|---|
| IngestWindowFunction | CloudWatch: /aws/lambda/{stack}-IngestWindowFunction |
| AudioUploadCompleteFunction | CloudWatch: /aws/lambda/{stack}-AudioUploadCompleteFunction |
| Frontend upload flow | createFlowLogger("upload-memory") — client-side console/telemetry |
Deployment
- Mechanism:
SAM - Deploy command:
- Notes: The
/memories/uploadsendpoints are not registered intemplate.yamland are not deployed. Any new handlers must be added both as Lambda functions intemplate.yamland asapp.pyfiles undermain/server/api/memories/uploads/.