Skip to content

Sessions Audio Upload — Presigned PUT Migration

Plan Metadata

  • Plan type: plan
  • Parent plan: N/A
  • Depends on: #479 (boto3 client S3_CONFIG / S3_UPLOAD_CONFIG in shared layer) — bridges audio path until this lands
  • Status: draft

Status semantics: - draft: Plan is being created or updated and is not final. - approved: Plan is approved but not yet applied in code. - documentation: Code currently exists and matches the plan contract.

System Intent

  • What is being built: Replace the synchronous body-bearing POST /sessions/{id}/audio?windowIndex=N handler with a presigned-PUT URL grant + S3 ObjectCreated event-driven completion handler (delivered via EventBridge). Removes the 29s API-Gateway-cap failure class for large audio chunks. Splits one synchronous handler into two single-responsibility handlers connected by an S3 event source.
  • Primary consumer(s): SessionAudioFunction (rewritten — URL minter only); AudioUploadCompleteFunction (new — S3-event-driven bookkeeping + ingest trigger); IngestWindowFunction (unchanged — receives async invoke from completion handler); React Native client capture-session.ts (rewritten audio uploadFn).
  • Boundary (black-box scope only): Frame upload path (sessions/frames) is out of scope — body is always small enough to fit the 29s cap. IngestWindowFunction internals are out of scope — only its async-invoke entry contract matters here.

Motivation

sessions/audio is the only API-GW-fronted Lambda in the codebase that routes a multi-MB request body through Lambda memory. WAV audio chunks (~2 MB per 60s window for the chunked path; tens of MB for the legacy full-session path) routinely take longer than 8s to transit S3 on cell networks. API Gateway hard-caps requests at 29s and cannot be configured higher. Result: the slowest legitimate uploads silently 504, audio chunks are lost, and the user's session has gaps.

Goals:

  1. Eliminate the 29s API-Gateway-cap failure class for audio uploads at any realistic body size.
  2. Preserve atomicity semantics from the client queue's POV — each audio item still either fully uploads or parks for retry, no half-states.
  3. Preserve the exactly-once ingest trigger guarantee under concurrent frame+audio window completion.
  4. No orphan S3 objects — if it's in S3, it gets registered.

Non-goals:

  1. Migrating sessions/frames — body is small (~100 KB JPEG), current path fits well within 29s cap, doubling round trips would hurt without gain.
  2. Migrating memories/video large multipart I/O — tracked separately in #494.
  3. Repository-wide enforcement of boto3.client(..., config=...) convention — tracked in #495.
  4. Backward compatibility with the legacy _handle_legacy_upload full-session path — there are no users today; the path is deleted entirely.
  5. Property-based, real-AWS-integration, and load tests — deferred to #496, #497, #498 with explicit resolution triggers.

Architecture

                                          ┌──────────────────────┐
                                          │ SessionAudioFunction │
  ┌──────────────┐   POST /sessions/{id}  │ (existing, rewritten)│
  │  RN client   │ ──────────────────────▶│                      │
  │              │  {sizeBytes}           │ Generates presigned  │
  │              │ ◀──────────────────────│ PUT URL              │
  │              │  {url, s3Key, ...}     └──────────────────────┘
  │              │
  │              │  PUT <presigned URL>   ┌──────────────────────┐
  │              │ ──────────────────────▶│        S3            │
  │              │      audio bytes       │ sessions/{id}/       │
  │              │ ◀───────────  200 OK   │ window_NNN/audio.wav │
  └──────────────┘                        └──────────┬───────────┘
                                                     │ ObjectCreated event
                                                     │ (via EventBridge)
                                          ┌──────────────────────┐
                                          │ AudioUploadComplete  │
                                          │ Function (NEW)       │
                                          │                      │
                                          │ - parse S3 key       │
                                          │ - DDB ADD            │
                                          │   completedAudio     │
                                          │   Windows            │
                                          │ - claim+trigger      │
                                          │   ingest             │
                                          └──────────┬───────────┘
                                                     │ invoke async (Event)
                                          ┌──────────────────────┐
                                          │ IngestWindowFunction │
                                          │ (unchanged)          │
                                          └──────────────────────┘

Three Lambdas, one new. Two AWS surfaces: presigned URL grant (API GW) + S3 ObjectCreated event (S3 → EventBridge → Lambda). No new HTTP endpoints from the client's POV — same POST /sessions/{id}/audio?windowIndex=N URL, different response shape.

Endpoint contracts

POST /sessions/{id}/audio?windowIndex=N (rewritten SessionAudioFunction)

Request body (JSON):

{"sizeBytes": 1923456}

Response 200:

{
  "url": "https://encache-raw-memory.s3.amazonaws.com/sessions/abc/window_007/audio.wav?X-Amz-Algorithm=...",
  "s3Key": "sessions/abc/window_007/audio.wav",
  "windowIndex": 7,
  "expiresIn": 300
}

Response 400:

  • sessionId path param missing
  • windowIndex missing, negative, or non-integer
  • sizeBytes missing, ≤ 0, or > 10 MB cap
  • sessionId not in DynamoDB → 404

Removed from this handler: legacy full-session branch (_handle_legacy_upload), body decoding, S3 PUT, DDB update, ingest trigger.

PUT <presigned URL> (client → S3, no Lambda)

Headers (all must match URL binding exactly):

  • Content-Type: audio/wav
  • Content-Length: <sizeBytes>

Body: raw WAV bytes. URL expires after 300s.

AudioUploadCompleteFunction (new, S3-event triggered via EventBridge)

Event source: S3 ObjectCreated event on bucket encache-raw-memory, delivered through the default EventBridge bus with a SAM-managed EventBridgeRule filtering on detail.bucket.name = encache-raw-memory, detail.object.key prefix=sessions/, detail.object.key suffix=audio.wav. EventBridge delivers one event per invocation (no Records wrapper); the handler reads event["detail"]["object"]["key"].

Behavior:

  1. Parse key matching sessions/{sessionId}/window_{NNN}/audio.wav regex; extract sessionId and windowIndex.
  2. If parse fails: log step: audio_upload_malformed_key, return success (do not fail invocation — failed = AWS retry of unparseable event forever).
  3. DDB update_item ADD completedAudioWindows :win, return ALL_NEW.
  4. If session row missing: log step: audio_upload_unknown_session, return success (session deleted mid-flight — orphan stays in S3 as cold storage).
  5. Read captureMode and completedFrameWindows from returned attrs.
  6. Compute should_trigger: captureMode == "audio_only" OR windowIndex in completedFrameWindows.
  7. If should_trigger: call _claim_and_trigger — DDB conditional ADD ingestTriggeredWindows, then Lambda Invoke(InvocationType="Event"). If the invoke fails, the claim is rolled back via DDB DELETE ingestTriggeredWindows so the AWS async-invoke retry can re-claim and re-invoke (without rollback, the retry's conditional would silently reject and drop the window).
  8. Log step transitions matching existing flow: audio_post shape with step: audio_upload_registered.

DLQ: dedicated SQS queue AudioUploadCompleteDLQ, 2 AWS async-invoke retries before DLQ. Manual replay path documented in runbook.

S3 key shape

  • sessions/{sessionId}/window_{N:03d}/audio.wav
  • Server-determined, never client-supplied. Prevents path traversal into other sessions.

Data flow

Happy path (one window)

  1. Client queue picks audio item from manifest.
  2. POST /sessions/{id}/audio?windowIndex=7 with {sizeBytes: 1923456}.
  3. SessionAudioFunction validates session exists in DDB, generates URL.
  4. Returns {url, s3Key, windowIndex: 7, expiresIn: 300}.
  5. Client PUTs raw bytes to url with Content-Length: 1923456 and Content-Type: audio/wav.
  6. S3 responds 200 OK.
  7. Client marks item uploaded in dedup index, removes from queue.
  8. S3 fires ObjectCreated event to EventBridge (typically <1s, up to 30s). EventBridge routes to AudioUploadCompleteFunction via the SAM-managed S3AudioCreated rule.
  9. AudioUploadCompleteFunction:
  10. Parses key → (id, 7).
  11. DDB ADD 7 to completedAudioWindows.
  12. Checks captureMode + completedFrameWindows.
  13. If ready: _claim_and_trigger fires IngestWindowFunction async.

Failure modes

Where What Recovery
Step 2 (POST URL grant) 5xx, network error Client queue retries — existing PersistentUploadQueue logic, unchanged
Step 2 400 invalid sessionId/windowIndex/sizeBytes Client gives up, item parked, surfaced via parkedCount metric
Step 3 (PUT to S3) 403 SignatureDoesNotMatch Means client tampered with headers OR clock skew OR Content-Length mismatch — park + retry from step 2 (fresh URL)
Step 3 5xx from S3 Retry PUT with same URL until expiry; after expiry restart from step 2
Step 3 URL expired (>5 min) Restart from step 2 — client requests fresh URL
Step 4 (S3 event) S3 event fails to deliver to Lambda AWS retries automatically; after 2 failures → DLQ
Step 5 (AudioUploadComplete) DDB throttle Lambda exception → AWS async retry (2x) → DLQ
Step 5 sessionId resolves to no DDB row (deleted mid-flight) Log audio_upload_unknown_session, return success — object stays as cold storage
Step 5 _claim_and_trigger ingest invoke fails Claim rolled back via DDB DELETE ingestTriggeredWindows. Lambda exception → AWS async retry (now able to re-claim). After 2 retry failures → DLQ.
Step 5 Claim rollback itself fails (DDB throttle during DELETE) Log ingest_claim_rollback_failed. Original invoke failure still propagates. Manual cleanup: operator deletes window from ingestTriggeredWindows set to unblock retry.
Step 5 Duplicate S3 event (same key fires twice) DDB ADD is idempotent (set semantics); _claim_and_trigger conditional dedups ingest invoke

Race: frame vs audio window completion

Unchanged from today. Both AudioUploadCompleteFunction (was SessionAudioFunction) and FramesFunction race to check completedFrameWindows ∩ completedAudioWindows and call _claim_and_trigger. The DDB conditional in _claim_and_trigger ensures exactly one wins. No new race introduced.

Orphan handling

S3 event is the source of truth — there are no orphans by definition. Every object in S3 fires an event; every event hits the Lambda; every Lambda either registers the window or DLQs. DLQ contents are operational signal, not data loss (manual replay).

Client queue contract

The queue's uploadFn currently makes one POST per audio item. After migration:

async function uploadAudio(item) {
  const file = new File(toFileUri(item.uri));
  const bytes = await file.bytes();
  const sizeBytes = bytes.byteLength;

  const { url } = await api.post(
    `/sessions/${sessionId}/audio?windowIndex=${item.windowIndex}`,
    JSON.stringify({ sizeBytes }),
    { headers: { "Content-Type": "application/json" }, timeout: 5000 },
  );

  const controller = new AbortController();
  const timeoutId = setTimeout(() => controller.abort(), 60000);
  try {
    const resp = await fetch(url, {
      method: "PUT",
      headers: { "Content-Type": "audio/wav", "Content-Length": String(sizeBytes) },
      body: bytes,
      signal: controller.signal,
    });
    if (!resp.ok) throw new Error(`S3 PUT failed: ${resp.status}`);
  } finally {
    clearTimeout(timeoutId);
  }
}

The 60s AbortController on the PUT prevents a stalled cell upload from blocking the upload queue indefinitely. Without it, the queue's per-item retry policy never fires because the fetch never resolves or rejects.

If either step throws, the queue parks the item and retries with full reset (fresh URL). uploadFn atomicity preserved.

Testing strategy (no test theater)

Discipline:

  1. Spec → test → code, in that order. Every test cites a Failure-Modes table row or Happy-Path step as its justification. Tests written before the corresponding production code exists.
  2. Test the contract, not the call. Don't assert mock_s3.put_object.called_with(...) when the input to that mock is the same constant the production code uses — that's circular. Assert on observable state: object exists in S3 with expected key+size, DDB row contains expected attribute, ingest Lambda received expected payload.
  3. Mock only at the AWS edge. Use moto for S3+DDB+Lambda, not handcrafted mocks of boto3 clients. moto enforces real S3 semantics (key uniqueness, event payload shape, signature binding) so passing tests are stronger evidence.
  4. One failure mode = one named test. Test name describes the behavior under test, not the function being tested.
  5. Red-green-refactor literally. Each test added to the suite runs red, then a minimal code change makes it green, then refactor if needed.

Test inventory

SessionAudioFunction (URL minter) — unit, moto for DDB session lookup:

  • test_returns_presigned_url_with_correct_s3_key_for_valid_request
  • test_url_binds_content_type_audio_wav
  • test_url_binds_exact_content_length_from_size_bytes
  • test_url_expires_in_300_seconds
  • test_returns_400_when_session_id_path_param_missing
  • test_returns_400_when_window_index_missing
  • test_returns_400_when_window_index_negative
  • test_returns_400_when_window_index_non_integer
  • test_returns_400_when_size_bytes_missing
  • test_returns_400_when_size_bytes_zero_or_negative
  • test_returns_400_when_size_bytes_exceeds_10mb_cap
  • test_returns_404_when_session_id_not_in_dynamodb

AudioUploadCompleteFunction (S3 event handler) — unit, moto for DDB+Lambda:

  • test_happy_path_audio_only_mode_triggers_ingest_immediately
  • test_audio_video_mode_triggers_ingest_only_when_frames_ready
  • test_audio_video_mode_does_not_trigger_when_frames_pending
  • test_adds_window_index_to_completed_audio_windows_in_ddb
  • test_duplicate_s3_event_does_not_double_trigger_ingest
  • test_malformed_s3_key_logs_and_returns_success
  • test_unknown_session_id_logs_session_not_found_and_returns_success
  • test_ddb_throttle_raises_for_aws_retry
  • test_ingest_invoke_failure_raises_for_aws_retry
  • test_concurrent_audio_and_frame_completion_invoke_ingest_exactly_once

Integration (end-to-end with moto, single test file):

  • test_full_upload_flow_writes_audio_and_triggers_ingest
  • test_url_signature_rejects_wrong_content_length
  • test_url_signature_rejects_wrong_content_type
  • test_expired_url_returns_403_on_put

Client (capture-session.ts uploadFn) — Jest:

  • uploads_audio_when_both_url_grant_and_put_succeed
  • parks_item_when_url_grant_returns_5xx
  • parks_item_when_put_to_s3_returns_5xx
  • parks_item_when_put_returns_403_signature_mismatch
  • refetches_url_after_5xx_retry_does_not_reuse_stale_url
  • does_not_remove_item_from_queue_if_put_throws

Deferred test scope

Filed as separate issues with explicit resolution triggers:

  • 496 — property-based tests (hypothesis + fast-check); trigger: 2 weeks green CI with real users, OR regression slip, OR contract refactor.

  • 497 — real-AWS integration test (not moto); trigger: boto3 major bump, OR presigned URL param change, OR moto-vs-real divergence.

  • 498 — load test for AudioUploadCompleteFunction; trigger: ~4000 concurrent recording users projected, OR first DDB throttle alarm, OR first DLQ depth > 0, OR pre-launch event.

Done criterion for testing

  • All tests above written and failing before any production code for them exists. Git history shows test commits preceding implementation commits.
  • Final suite: 100% green, no skips, no xfails.
  • Mutation check on critical paths: reverting the corresponding production logic causes the test to fail (guards against tests that pass for the wrong reason).

Deployment + infrastructure

CloudFormation / SAM changes (main/server/template.yaml)

Resource Action Notes
SessionAudioFunction (existing) Modify — keep route, rewrite handler Smaller code; can drop MemorySize if profile shows reduced footprint
AudioUploadCompleteFunction New Runtime: python3.12, Timeout: 30, MemorySize: 256, layers: SharedLayer, dedicated DLQ
AudioUploadCompleteDLQ New Type: AWS::SQS::Queue, redrive from Lambda (2 retries → DLQ)
AudioUploadCompletePermission New AWS::Lambda::Permission granting s3.amazonaws.com invoke rights on AudioUploadCompleteFunction
SessionAudioFunction.Policies Modify Drop DDB update_item on completedAudioWindows and lambda:InvokeFunction on IngestWindowFunction — no longer needed
AudioUploadCompleteFunction.Policies New DatabaseSsmPolicy, DDB update on encache-sessions, lambda:InvokeFunction for IngestWindowFunction

Note: S3 ObjectCreated notification on encache-raw-memory is wired in Terraform (main/devops/main.tf) — the bucket is Terraform-managed, not CloudFormation. SAM contributes only AudioUploadCompleteFunction + AudioUploadCompleteDLQ + AudioUploadCompletePermission (Lambda permission grant for s3.amazonaws.com).

Terraform changes (main/devops/main.tf)

Resource Action Notes
data "aws_lambda_function" "audio_upload_complete" New Looks up the SAM-deployed Lambda by deterministic name server-AudioUploadCompleteFunction
aws_s3_bucket_notification.raw_data_audio New Wires s3:ObjectCreated:* on prefix sessions/, suffix audio.wav to the looked-up Lambda

IAM principle

SessionAudioFunction loses write capability on S3, DDB, and Lambda invoke — strictly downgraded permissions. AudioUploadCompleteFunction gets only what was removed. Net IAM surface unchanged but cleanly separated by responsibility.

Deploy order

  1. cd main/server && sam deploy — creates the Lambda + permission grant. Lambda name pinned via FunctionName: !Sub "${AWS::StackName}-AudioUploadCompleteFunction" so the Terraform lookup is deterministic.
  2. cd main/devops && terraform apply — wires the notification. Reverse order fails: data lookup errors if the Lambda doesn't exist.

With zero users today, no canary or phased rollout is required.

Observability

New CloudWatch alarms:

  • AudioUploadCompleteFunction Errors > 0 over 5 min → Discord webhook
  • AudioUploadCompleteDLQ ApproximateNumberOfMessagesVisible > 0 → Discord webhook
  • SessionAudioFunction Errors > 5/min → Discord webhook (URL grant failures are user-facing)

Log grep patterns for runbook:

  • "step": "audio_url_granted" — successful URL mint
  • "step": "audio_upload_registered" — S3 event handler success
  • "step": "audio_upload_unknown_session" — session deleted mid-flight; rare
  • "step": "audio_upload_malformed_key" — should never happen; investigate immediately
  • "step": "ingest_claim_lost" — race between audio+frame triggers; harmless
  • "step": "ingest_claim_rolled_back" — invoke failed; claim removed so AWS retry can re-claim. Expected to be rare; spike indicates ingest Lambda health issues.
  • "step": "ingest_claim_rollback_failed" — both invoke AND rollback failed. Investigate immediately; window is stuck (claim recorded but ingest never fired and retry will see the claim).

Cutover

No flag, no canary, no gradual rollout (no users). Ship via standard sam deploy.

Post-deploy verification:

  1. Run integration test suite against deployed stack (not just moto).
  2. Manually exercise capture flow on a test device → confirm audio chunk lands in S3 + DDB updates + ingest fires.
  3. Trigger artificial failure (bad URL signature) → confirm DLQ receives it.

Rollback

git revert + sam deploy. Old handler code restored, S3 notification deleted on the new function being absent. Caveat: any audio chunks PUT to S3 between rollback decision and rollback completion will not be registered (handler is gone). Acceptable risk given zero-user scenario.

Out of scope

  1. sessions/frames migration — body is small (~100 KB), current path fits 29s cap, doubling round trips would hurt.
  2. memories/video large multipart I/O — tracked in #494.
  3. Repository-wide enforcement of boto3.client(..., config=...) — tracked in #495.
  4. Backward-compatible legacy _handle_legacy_upload path — deleted entirely; no users today.
  5. Property-based tests — #496.
  6. Real-AWS integration tests — #497.
  7. Load tests — #498.

Files affected

Server (Python):

  • main/server/api/sessions/audio/app.py — rewrite (URL minter only)
  • main/server/events/audio_upload_complete/app.py — new Lambda handler. New top-level events/ directory introduced for S3/SNS/EventBridge-triggered handlers (no existing event-driven Lambda in repo, so no precedent — clean separation from api/ for HTTP and worldmm/ for the ingest pipeline).
  • main/server/template.yaml — new function + DLQ + S3 notification + IAM changes
  • main/server/tests/unit/test_session_audio_url_minter.py — new unit tests per Test Inventory (follows repo convention of centralized tests/unit/, not per-handler __tests__/)
  • main/server/tests/unit/test_audio_upload_complete.py — new unit tests per Test Inventory
  • main/server/tests/integration/test_audio_presigned_flow.py — new end-to-end tests

Client (TypeScript):

  • main/app/lib/capture-session.ts — rewrite audio branch of uploadFn (lines 103-112)
  • main/app/lib/__tests__/capture-session.test.ts — new tests per Test Inventory

Docs:

  • docs/plans/2026-05-19-sessions-audio-presigned.md — this file
  • docs/docs/ — update affected system docs once code lands (separate pass)