Sessions Audio Upload — Presigned PUT Migration

Plan Metadata

Plan type: plan
Parent plan: N/A
Depends on: #479 (boto3 client S3_CONFIG / S3_UPLOAD_CONFIG in shared layer) — bridges audio path until this lands
Status: draft

Status semantics: - draft: Plan is being created or updated and is not final. - approved: Plan is approved but not yet applied in code. - documentation: Code currently exists and matches the plan contract.

System Intent

What is being built: Replace the synchronous body-bearing POST /sessions/{id}/audio?windowIndex=N handler with a presigned-PUT URL grant + S3 ObjectCreated event-driven completion handler (delivered via EventBridge). Removes the 29s API-Gateway-cap failure class for large audio chunks. Splits one synchronous handler into two single-responsibility handlers connected by an S3 event source.
Primary consumer(s): SessionAudioFunction (rewritten — URL minter only); AudioUploadCompleteFunction (new — S3-event-driven bookkeeping + ingest trigger); IngestWindowFunction (unchanged — receives async invoke from completion handler); React Native client capture-session.ts (rewritten audio uploadFn).
Boundary (black-box scope only): Frame upload path (sessions/frames) is out of scope — body is always small enough to fit the 29s cap. IngestWindowFunction internals are out of scope — only its async-invoke entry contract matters here.

Motivation

sessions/audio is the only API-GW-fronted Lambda in the codebase that routes a multi-MB request body through Lambda memory. WAV audio chunks (~2 MB per 60s window for the chunked path; tens of MB for the legacy full-session path) routinely take longer than 8s to transit S3 on cell networks. API Gateway hard-caps requests at 29s and cannot be configured higher. Result: the slowest legitimate uploads silently 504, audio chunks are lost, and the user's session has gaps.

Goals:

Eliminate the 29s API-Gateway-cap failure class for audio uploads at any realistic body size.
Preserve atomicity semantics from the client queue's POV — each audio item still either fully uploads or parks for retry, no half-states.
Preserve the exactly-once ingest trigger guarantee under concurrent frame+audio window completion.
No orphan S3 objects — if it's in S3, it gets registered.

Non-goals:

Migrating sessions/frames — body is small (~100 KB JPEG), current path fits well within 29s cap, doubling round trips would hurt without gain.
Migrating memories/video large multipart I/O — tracked separately in #494.
Repository-wide enforcement of boto3.client(..., config=...) convention — tracked in #495.
Backward compatibility with the legacy _handle_legacy_upload full-session path — there are no users today; the path is deleted entirely.
Property-based, real-AWS-integration, and load tests — deferred to #496, #497, #498 with explicit resolution triggers.

Architecture

                                          ┌──────────────────────┐
                                          │ SessionAudioFunction │
  ┌──────────────┐   POST /sessions/{id}  │ (existing, rewritten)│
  │  RN client   │ ──────────────────────▶│                      │
  │              │  {sizeBytes}           │ Generates presigned  │
  │              │ ◀──────────────────────│ PUT URL              │
  │              │  {url, s3Key, ...}     └──────────────────────┘
  │              │
  │              │  PUT <presigned URL>   ┌──────────────────────┐
  │              │ ──────────────────────▶│        S3            │
  │              │      audio bytes       │ sessions/{id}/       │
  │              │ ◀───────────  200 OK   │ window_NNN/audio.wav │
  └──────────────┘                        └──────────┬───────────┘
                                                     │ ObjectCreated event
                                                     │ (via EventBridge)
                                                     ▼
                                          ┌──────────────────────┐
                                          │ AudioUploadComplete  │
                                          │ Function (NEW)       │
                                          │                      │
                                          │ - parse S3 key       │
                                          │ - DDB ADD            │
                                          │   completedAudio     │
                                          │   Windows            │
                                          │ - claim+trigger      │
                                          │   ingest             │
                                          └──────────┬───────────┘
                                                     │ invoke async (Event)
                                                     ▼
                                          ┌──────────────────────┐
                                          │ IngestWindowFunction │
                                          │ (unchanged)          │
                                          └──────────────────────┘

Three Lambdas, one new. Two AWS surfaces: presigned URL grant (API GW) + S3 ObjectCreated event (S3 → EventBridge → Lambda). No new HTTP endpoints from the client's POV — same POST /sessions/{id}/audio?windowIndex=N URL, different response shape.

Endpoint contracts

`POST /sessions/{id}/audio?windowIndex=N` (rewritten `SessionAudioFunction`)

Request body (JSON):

{"sizeBytes": 1923456}

Response 200:

{
  "url": "https://encache-raw-memory.s3.amazonaws.com/sessions/abc/window_007/audio.wav?X-Amz-Algorithm=...",
  "s3Key": "sessions/abc/window_007/audio.wav",
  "windowIndex": 7,
  "expiresIn": 300
}

Response 400:

sessionId path param missing
windowIndex missing, negative, or non-integer
sizeBytes missing, ≤ 0, or > 10 MB cap
sessionId not in DynamoDB → 404

Removed from this handler: legacy full-session branch (_handle_legacy_upload), body decoding, S3 PUT, DDB update, ingest trigger.

`PUT <presigned URL>` (client → S3, no Lambda)

Headers (all must match URL binding exactly):

Content-Type: audio/wav
Content-Length: <sizeBytes>

Body: raw WAV bytes. URL expires after 300s.

`AudioUploadCompleteFunction` (new, S3-event triggered via EventBridge)

Event source: S3 ObjectCreated event on bucket encache-raw-memory, delivered through the default EventBridge bus with a SAM-managed EventBridgeRule filtering on detail.bucket.name = encache-raw-memory, detail.object.key prefix=sessions/, detail.object.key suffix=audio.wav. EventBridge delivers one event per invocation (no Records wrapper); the handler reads event["detail"]["object"]["key"].

Behavior:

Parse key matching sessions/{sessionId}/window_{NNN}/audio.wav regex; extract sessionId and windowIndex.
If parse fails: log step: audio_upload_malformed_key, return success (do not fail invocation — failed = AWS retry of unparseable event forever).
DDB update_item ADD completedAudioWindows :win, return ALL_NEW.
If session row missing: log step: audio_upload_unknown_session, return success (session deleted mid-flight — orphan stays in S3 as cold storage).
Read captureMode and completedFrameWindows from returned attrs.
Compute should_trigger: captureMode == "audio_only" OR windowIndex in completedFrameWindows.
If should_trigger: call _claim_and_trigger — DDB conditional ADD ingestTriggeredWindows, then Lambda Invoke(InvocationType="Event"). If the invoke fails, the claim is rolled back via DDB DELETE ingestTriggeredWindows so the AWS async-invoke retry can re-claim and re-invoke (without rollback, the retry's conditional would silently reject and drop the window).
Log step transitions matching existing flow: audio_post shape with step: audio_upload_registered.

DLQ: dedicated SQS queue AudioUploadCompleteDLQ, 2 AWS async-invoke retries before DLQ. Manual replay path documented in runbook.

S3 key shape

sessions/{sessionId}/window_{N:03d}/audio.wav
Server-determined, never client-supplied. Prevents path traversal into other sessions.

Data flow

Happy path (one window)

Client queue picks audio item from manifest.
POST /sessions/{id}/audio?windowIndex=7 with {sizeBytes: 1923456}.
SessionAudioFunction validates session exists in DDB, generates URL.
Returns {url, s3Key, windowIndex: 7, expiresIn: 300}.
Client PUTs raw bytes to url with Content-Length: 1923456 and Content-Type: audio/wav.
S3 responds 200 OK.
Client marks item uploaded in dedup index, removes from queue.
S3 fires ObjectCreated event to EventBridge (typically <1s, up to 30s). EventBridge routes to AudioUploadCompleteFunction via the SAM-managed S3AudioCreated rule.
AudioUploadCompleteFunction:
Parses key → (id, 7).
DDB ADD 7 to completedAudioWindows.
Checks captureMode + completedFrameWindows.
If ready: _claim_and_trigger fires IngestWindowFunction async.

Failure modes

Where	What	Recovery
Step 2 (POST URL grant)	5xx, network error	Client queue retries — existing `PersistentUploadQueue` logic, unchanged
Step 2	400 invalid `sessionId`/`windowIndex`/`sizeBytes`	Client gives up, item parked, surfaced via `parkedCount` metric
Step 3 (PUT to S3)	403 `SignatureDoesNotMatch`	Means client tampered with headers OR clock skew OR Content-Length mismatch — park + retry from step 2 (fresh URL)
Step 3	5xx from S3	Retry PUT with same URL until expiry; after expiry restart from step 2
Step 3	URL expired (>5 min)	Restart from step 2 — client requests fresh URL
Step 4 (S3 event)	S3 event fails to deliver to Lambda	AWS retries automatically; after 2 failures → DLQ
Step 5 (`AudioUploadComplete`)	DDB throttle	Lambda exception → AWS async retry (2x) → DLQ
Step 5	`sessionId` resolves to no DDB row (deleted mid-flight)	Log `audio_upload_unknown_session`, return success — object stays as cold storage
Step 5	`_claim_and_trigger` ingest invoke fails	Claim rolled back via DDB `DELETE ingestTriggeredWindows`. Lambda exception → AWS async retry (now able to re-claim). After 2 retry failures → DLQ.
Step 5	Claim rollback itself fails (DDB throttle during DELETE)	Log `ingest_claim_rollback_failed`. Original invoke failure still propagates. Manual cleanup: operator deletes window from `ingestTriggeredWindows` set to unblock retry.
Step 5	Duplicate S3 event (same key fires twice)	DDB `ADD` is idempotent (set semantics); `_claim_and_trigger` conditional dedups ingest invoke

Race: frame vs audio window completion

Unchanged from today. Both AudioUploadCompleteFunction (was SessionAudioFunction) and FramesFunction race to check completedFrameWindows ∩ completedAudioWindows and call _claim_and_trigger. The DDB conditional in _claim_and_trigger ensures exactly one wins. No new race introduced.

Orphan handling

S3 event is the source of truth — there are no orphans by definition. Every object in S3 fires an event; every event hits the Lambda; every Lambda either registers the window or DLQs. DLQ contents are operational signal, not data loss (manual replay).

Client queue contract

The queue's uploadFn currently makes one POST per audio item. After migration:

async function uploadAudio(item) {
  const file = new File(toFileUri(item.uri));
  const bytes = await file.bytes();
  const sizeBytes = bytes.byteLength;

  const { url } = await api.post(
    `/sessions/${sessionId}/audio?windowIndex=${item.windowIndex}`,
    JSON.stringify({ sizeBytes }),
    { headers: { "Content-Type": "application/json" }, timeout: 5000 },
  );

  const controller = new AbortController();
  const timeoutId = setTimeout(() => controller.abort(), 60000);
  try {
    const resp = await fetch(url, {
      method: "PUT",
      headers: { "Content-Type": "audio/wav", "Content-Length": String(sizeBytes) },
      body: bytes,
      signal: controller.signal,
    });
    if (!resp.ok) throw new Error(`S3 PUT failed: ${resp.status}`);
  } finally {
    clearTimeout(timeoutId);
  }
}

The 60s AbortController on the PUT prevents a stalled cell upload from blocking the upload queue indefinitely. Without it, the queue's per-item retry policy never fires because the fetch never resolves or rejects.

If either step throws, the queue parks the item and retries with full reset (fresh URL). uploadFn atomicity preserved.

Testing strategy (no test theater)

Discipline:

Spec → test → code, in that order. Every test cites a Failure-Modes table row or Happy-Path step as its justification. Tests written before the corresponding production code exists.
Test the contract, not the call. Don't assert mock_s3.put_object.called_with(...) when the input to that mock is the same constant the production code uses — that's circular. Assert on observable state: object exists in S3 with expected key+size, DDB row contains expected attribute, ingest Lambda received expected payload.
Mock only at the AWS edge. Use moto for S3+DDB+Lambda, not handcrafted mocks of boto3 clients. moto enforces real S3 semantics (key uniqueness, event payload shape, signature binding) so passing tests are stronger evidence.
One failure mode = one named test. Test name describes the behavior under test, not the function being tested.
Red-green-refactor literally. Each test added to the suite runs red, then a minimal code change makes it green, then refactor if needed.

Test inventory

SessionAudioFunction (URL minter) — unit, moto for DDB session lookup:

test_returns_presigned_url_with_correct_s3_key_for_valid_request
test_url_binds_content_type_audio_wav
test_url_binds_exact_content_length_from_size_bytes
test_url_expires_in_300_seconds
test_returns_400_when_session_id_path_param_missing
test_returns_400_when_window_index_missing
test_returns_400_when_window_index_negative
test_returns_400_when_window_index_non_integer
test_returns_400_when_size_bytes_missing
test_returns_400_when_size_bytes_zero_or_negative
test_returns_400_when_size_bytes_exceeds_10mb_cap
test_returns_404_when_session_id_not_in_dynamodb

AudioUploadCompleteFunction (S3 event handler) — unit, moto for DDB+Lambda:

test_happy_path_audio_only_mode_triggers_ingest_immediately
test_audio_video_mode_triggers_ingest_only_when_frames_ready
test_audio_video_mode_does_not_trigger_when_frames_pending
test_adds_window_index_to_completed_audio_windows_in_ddb
test_duplicate_s3_event_does_not_double_trigger_ingest
test_malformed_s3_key_logs_and_returns_success
test_unknown_session_id_logs_session_not_found_and_returns_success
test_ddb_throttle_raises_for_aws_retry
test_ingest_invoke_failure_raises_for_aws_retry
test_concurrent_audio_and_frame_completion_invoke_ingest_exactly_once

Integration (end-to-end with moto, single test file):

test_full_upload_flow_writes_audio_and_triggers_ingest
test_url_signature_rejects_wrong_content_length
test_url_signature_rejects_wrong_content_type
test_expired_url_returns_403_on_put

Client (capture-session.ts uploadFn) — Jest:

uploads_audio_when_both_url_grant_and_put_succeed
parks_item_when_url_grant_returns_5xx
parks_item_when_put_to_s3_returns_5xx
parks_item_when_put_returns_403_signature_mismatch
refetches_url_after_5xx_retry_does_not_reuse_stale_url
does_not_remove_item_from_queue_if_put_throws

Deferred test scope

Filed as separate issues with explicit resolution triggers:

496 — property-based tests (hypothesis + fast-check); trigger: 2 weeks green CI with real users, OR regression slip, OR contract refactor.
497 — real-AWS integration test (not moto); trigger: boto3 major bump, OR presigned URL param change, OR moto-vs-real divergence.
498 — load test for AudioUploadCompleteFunction; trigger: ~4000 concurrent recording users projected, OR first DDB throttle alarm, OR first DLQ depth > 0, OR pre-launch event.

Done criterion for testing

All tests above written and failing before any production code for them exists. Git history shows test commits preceding implementation commits.
Final suite: 100% green, no skips, no xfails.
Mutation check on critical paths: reverting the corresponding production logic causes the test to fail (guards against tests that pass for the wrong reason).

Deployment + infrastructure

CloudFormation / SAM changes (`main/server/template.yaml`)

Resource	Action	Notes
`SessionAudioFunction` (existing)	Modify — keep route, rewrite handler	Smaller code; can drop `MemorySize` if profile shows reduced footprint
`AudioUploadCompleteFunction`	New	`Runtime: python3.12`, `Timeout: 30`, `MemorySize: 256`, layers: `SharedLayer`, dedicated DLQ
`AudioUploadCompleteDLQ`	New	`Type: AWS::SQS::Queue`, redrive from Lambda (2 retries → DLQ)
`AudioUploadCompletePermission`	New	`AWS::Lambda::Permission` granting `s3.amazonaws.com` invoke rights on `AudioUploadCompleteFunction`
`SessionAudioFunction.Policies`	Modify	Drop DDB `update_item` on `completedAudioWindows` and `lambda:InvokeFunction` on `IngestWindowFunction` — no longer needed
`AudioUploadCompleteFunction.Policies`	New	`DatabaseSsmPolicy`, DDB update on `encache-sessions`, `lambda:InvokeFunction` for `IngestWindowFunction`

Note: S3 ObjectCreated notification on encache-raw-memory is wired in Terraform (main/devops/main.tf) — the bucket is Terraform-managed, not CloudFormation. SAM contributes only AudioUploadCompleteFunction + AudioUploadCompleteDLQ + AudioUploadCompletePermission (Lambda permission grant for s3.amazonaws.com).

Terraform changes (`main/devops/main.tf`)

Resource	Action	Notes
`data "aws_lambda_function" "audio_upload_complete"`	New	Looks up the SAM-deployed Lambda by deterministic name `server-AudioUploadCompleteFunction`
`aws_s3_bucket_notification.raw_data_audio`	New	Wires `s3:ObjectCreated:*` on prefix `sessions/`, suffix `audio.wav` to the looked-up Lambda

IAM principle

SessionAudioFunction loses write capability on S3, DDB, and Lambda invoke — strictly downgraded permissions. AudioUploadCompleteFunction gets only what was removed. Net IAM surface unchanged but cleanly separated by responsibility.

Deploy order

cd main/server && sam deploy — creates the Lambda + permission grant. Lambda name pinned via FunctionName: !Sub "${AWS::StackName}-AudioUploadCompleteFunction" so the Terraform lookup is deterministic.
cd main/devops && terraform apply — wires the notification. Reverse order fails: data lookup errors if the Lambda doesn't exist.

With zero users today, no canary or phased rollout is required.

Observability

New CloudWatch alarms:

AudioUploadCompleteFunction Errors > 0 over 5 min → Discord webhook
AudioUploadCompleteDLQ ApproximateNumberOfMessagesVisible > 0 → Discord webhook
SessionAudioFunction Errors > 5/min → Discord webhook (URL grant failures are user-facing)

Log grep patterns for runbook:

"step": "audio_url_granted" — successful URL mint
"step": "audio_upload_registered" — S3 event handler success
"step": "audio_upload_unknown_session" — session deleted mid-flight; rare
"step": "audio_upload_malformed_key" — should never happen; investigate immediately
"step": "ingest_claim_lost" — race between audio+frame triggers; harmless
"step": "ingest_claim_rolled_back" — invoke failed; claim removed so AWS retry can re-claim. Expected to be rare; spike indicates ingest Lambda health issues.
"step": "ingest_claim_rollback_failed" — both invoke AND rollback failed. Investigate immediately; window is stuck (claim recorded but ingest never fired and retry will see the claim).

Cutover

No flag, no canary, no gradual rollout (no users). Ship via standard sam deploy.

Post-deploy verification:

Run integration test suite against deployed stack (not just moto).
Manually exercise capture flow on a test device → confirm audio chunk lands in S3 + DDB updates + ingest fires.
Trigger artificial failure (bad URL signature) → confirm DLQ receives it.

Rollback

git revert + sam deploy. Old handler code restored, S3 notification deleted on the new function being absent. Caveat: any audio chunks PUT to S3 between rollback decision and rollback completion will not be registered (handler is gone). Acceptable risk given zero-user scenario.

Out of scope

sessions/frames migration — body is small (~100 KB), current path fits 29s cap, doubling round trips would hurt.
memories/video large multipart I/O — tracked in #494.
Repository-wide enforcement of boto3.client(..., config=...) — tracked in #495.
Backward-compatible legacy _handle_legacy_upload path — deleted entirely; no users today.
Property-based tests — #496.
Real-AWS integration tests — #497.
Load tests — #498.

Files affected

Server (Python):

main/server/api/sessions/audio/app.py — rewrite (URL minter only)
main/server/events/audio_upload_complete/app.py — new Lambda handler. New top-level events/ directory introduced for S3/SNS/EventBridge-triggered handlers (no existing event-driven Lambda in repo, so no precedent — clean separation from api/ for HTTP and worldmm/ for the ingest pipeline).
main/server/template.yaml — new function + DLQ + S3 notification + IAM changes
main/server/tests/unit/test_session_audio_url_minter.py — new unit tests per Test Inventory (follows repo convention of centralized tests/unit/, not per-handler __tests__/)
main/server/tests/unit/test_audio_upload_complete.py — new unit tests per Test Inventory
main/server/tests/integration/test_audio_presigned_flow.py — new end-to-end tests

Client (TypeScript):

main/app/lib/capture-session.ts — rewrite audio branch of uploadFn (lines 103-112)
main/app/lib/__tests__/capture-session.test.ts — new tests per Test Inventory

Docs:

docs/plans/2026-05-19-sessions-audio-presigned.md — this file
docs/docs/ — update affected system docs once code lands (separate pass)