Memory Upload Part PUT Fails With Generic Network Error
Metadata
- Date:
2026-03-21 - Status:
investigating - Severity:
medium - Related issue/ticket:
N/A - Owner:
AI + Memory Upload Flow
About
Overview: - Upload start and presigned-part URL generation succeed, but the client fails on the direct S3 PUT with Network request failed. - This blocks memory media completion and marks uploads as failed in memories_uploads_complete_app. - Newer runs show a mixed symptom: the app surfaces AxiosError: Network Error while backend logs show /memories/uploads -> /parts -> /complete all returning 200 for at least one request in the same period.
Technical Questions: - Is the failure caused by mobile runtime networking (DNS/TLS/transport), by request-body semantics, or by environment connectivity? - Does the failure occur only for specific device/runtime/network combinations? - Are there lower-level network logs (device/native) available for the same timestamp? - Is request abortion involved (AbortSignal) in any failing runs?
Resources: - Frontend uploader path: main/app/lib/api/memory/createNewMemory.ts - Debugger trigger path: main/app/components/Debugger.tsx - User-provided runtime signature: - upload_part_requested logged - upload_part_runtime_failed with error_message: "Network request failed" - server-side /memories/uploads/{uploadId}/parts returned 200 - later run: app AxiosError: Network Error while server finalized upload and returned /complete 200
Steps to cause failure
flowchart LR
Start[Create memory] --> StartMPU[/memories/uploads 200/]
StartMPU --> PresignedPart[/memories/uploads/uploadId/parts 200/]
PresignedPart --> PutToS3[Client PUT presigned URL]
PutToS3 --> RuntimeError[Network request failed]
RuntimeError --> CompleteFailed[/memories/uploads/uploadId/complete with failed_uploads/] System
flowchart TD
Mobile[Mobile app createNewMemory] --> APIStart[memories_uploads_start_app]
Mobile --> APIPart[memories_uploads_part_app]
Mobile --> S3[S3 presigned upload_part URL]
Mobile --> APIComplete[memories_uploads_complete_app]
APIPart --> S3 Notes about the system can go here.
Reproduction Details
- Start a memory upload with a video item.
- Obtain presigned URL successfully from
/memories/uploads/{uploadId}/parts. - Fail on direct
fetch(PUT presigned_url)withTypeError: Network request failed.
Reproduction test (unit preferred): main/app/__tests__/create-new-memory.test.ts adds a Network request failed path and asserts diagnostic log fields.
Notes for PR
Root cause is not yet proven from existing logs. Current evidence strongly suggests at least one client-side transport/race condition (overlapping debug triggers or cancelled in-flight axios request) in addition to prior S3 PUT-path failures, because backend completed full upload lifecycle (start, parts, complete) successfully while the app still emitted AxiosError: Network Error.
Audit Log
| ID | Action | Note | Context |
|---|---|---|---|
| 1 | Create audit log | Initialized investigation for generic network failure on S3 part upload | user report + logs |
| 2 | Inspect existing logs | Confirmed start/part endpoints return 200; failure happens in client direct PUT stage | server/client log comparison |
| 3 | Add diagnostics | Added upload_part_put_requested and enriched upload_part_runtime_failed context | createNewMemory.ts |
| 4 | Add regression-style test | Ensured Network request failed emits diagnostic payload | create-new-memory.test.ts |
| 5 | Analyze rerun logs | Confirmed failures correlated with presigned URL signature/version differences and Android runtime TypeError payloads | user rerun logs |
| 6 | Apply targeted fix | Switched mobile upload PUT body to Blob and forced part presign generation to SigV4 (s3v4) | app uploader + part lambda |
| 7 | Compare client/server timelines | Observed app AxiosError: Network Error while server completed start -> parts -> complete with HTTP 200 | 2026-03-22 08:06 logs |
| 8 | Narrow hypothesis | Treat current failure as likely client transport race/cancellation rather than backend rejection for those runs | duplicate trigger + concurrent request evidence |
Verification
- [ ] Reproduced failure before fix
- [ ] Reproduction test fails before fix
- [ ] Root cause identified with evidence
- [ ] Fix applied at source (no workaround-only patch)
- [ ] Reproduction test passes after fix
- [ ] Reproduction path now passes
- [x] Regression test added/updated (or
N/Awith reason) - [x] Verified no duplicate solved-bug log exists for same root cause