Skip to content

Memory Upload Part PUT Fails With Generic Network Error

Metadata

  • Date: 2026-03-21
  • Status: investigating
  • Severity: medium
  • Related issue/ticket: N/A
  • Owner: AI + Memory Upload Flow

About

Overview: - Upload start and presigned-part URL generation succeed, but the client fails on the direct S3 PUT with Network request failed. - This blocks memory media completion and marks uploads as failed in memories_uploads_complete_app. - Newer runs show a mixed symptom: the app surfaces AxiosError: Network Error while backend logs show /memories/uploads -> /parts -> /complete all returning 200 for at least one request in the same period.

Technical Questions: - Is the failure caused by mobile runtime networking (DNS/TLS/transport), by request-body semantics, or by environment connectivity? - Does the failure occur only for specific device/runtime/network combinations? - Are there lower-level network logs (device/native) available for the same timestamp? - Is request abortion involved (AbortSignal) in any failing runs?

Resources: - Frontend uploader path: main/app/lib/api/memory/createNewMemory.ts - Debugger trigger path: main/app/components/Debugger.tsx - User-provided runtime signature: - upload_part_requested logged - upload_part_runtime_failed with error_message: "Network request failed" - server-side /memories/uploads/{uploadId}/parts returned 200 - later run: app AxiosError: Network Error while server finalized upload and returned /complete 200

Steps to cause failure

flowchart LR
Start[Create memory] --> StartMPU[/memories/uploads 200/]
StartMPU --> PresignedPart[/memories/uploads/uploadId/parts 200/]
PresignedPart --> PutToS3[Client PUT presigned URL]
PutToS3 --> RuntimeError[Network request failed]
RuntimeError --> CompleteFailed[/memories/uploads/uploadId/complete with failed_uploads/]

System

flowchart TD
Mobile[Mobile app createNewMemory] --> APIStart[memories_uploads_start_app]
Mobile --> APIPart[memories_uploads_part_app]
Mobile --> S3[S3 presigned upload_part URL]
Mobile --> APIComplete[memories_uploads_complete_app]
APIPart --> S3

Notes about the system can go here.

Reproduction Details

  1. Start a memory upload with a video item.
  2. Obtain presigned URL successfully from /memories/uploads/{uploadId}/parts.
  3. Fail on direct fetch(PUT presigned_url) with TypeError: Network request failed.

Reproduction test (unit preferred): main/app/__tests__/create-new-memory.test.ts adds a Network request failed path and asserts diagnostic log fields.

Notes for PR

Root cause is not yet proven from existing logs. Current evidence strongly suggests at least one client-side transport/race condition (overlapping debug triggers or cancelled in-flight axios request) in addition to prior S3 PUT-path failures, because backend completed full upload lifecycle (start, parts, complete) successfully while the app still emitted AxiosError: Network Error.

Audit Log

ID Action Note Context
1 Create audit log Initialized investigation for generic network failure on S3 part upload user report + logs
2 Inspect existing logs Confirmed start/part endpoints return 200; failure happens in client direct PUT stage server/client log comparison
3 Add diagnostics Added upload_part_put_requested and enriched upload_part_runtime_failed context createNewMemory.ts
4 Add regression-style test Ensured Network request failed emits diagnostic payload create-new-memory.test.ts
5 Analyze rerun logs Confirmed failures correlated with presigned URL signature/version differences and Android runtime TypeError payloads user rerun logs
6 Apply targeted fix Switched mobile upload PUT body to Blob and forced part presign generation to SigV4 (s3v4) app uploader + part lambda
7 Compare client/server timelines Observed app AxiosError: Network Error while server completed start -> parts -> complete with HTTP 200 2026-03-22 08:06 logs
8 Narrow hypothesis Treat current failure as likely client transport race/cancellation rather than backend rejection for those runs duplicate trigger + concurrent request evidence

Verification

  • [ ] Reproduced failure before fix
  • [ ] Reproduction test fails before fix
  • [ ] Root cause identified with evidence
  • [ ] Fix applied at source (no workaround-only patch)
  • [ ] Reproduction test passes after fix
  • [ ] Reproduction path now passes
  • [x] Regression test added/updated (or N/A with reason)
  • [x] Verified no duplicate solved-bug log exists for same root cause