Glasses Video Processing: WAV Audio Written to .m4a Temp File

Metadata

Date: 2026-04-26
Status: fixed
Severity: high
Related issue/ticket: N/A
Owner: N/A

About

Overview: - Glasses-mode (captureMode: "audio_video") video upload sessions fail silently at the transcription step inside IngestWindowFunction. - The mobile app uploads 30-second WAV audio chunks to S3 at sessions/{sessionId}/window_{N:03d}/audio.wav. - When IngestWindowFunction downloads that audio and passes it to Groq Whisper for transcription, it writes the WAV bytes to a tempfile.NamedTemporaryFile with suffix=".m4a". - Groq Whisper detects the file format by extension. Providing a .m4a file name containing raw WAV data causes a format mismatch: the API raises an InvalidRequestError or returns an empty transcript, breaking the entire window-processing pipeline.

Technical Questions: - Are we making assumptions? No — the suffix .m4a in the source code is an explicit, clearly wrong constant. - How old is this bug? Introduced when the per-window audio chunk upload flow was added (chunked WAV path). The function was originally written for legacy full-session M4A uploads. - Is there anything obvious we might have missed? Yes — the legacy S3 fallback path (sessions/{sessionId}/audio/full.m4a) is M4A audio, but the per-window path is always WAV. The suffix was never updated when the new upload path was introduced. - Are there specific system states required to reproduce it? Glasses mode (captureMode: "audio_video") with audio chunks uploaded via the new per-window path.

Resources: - main/server/worldmm/pipeline/ingest_window.py — _transcribe_audio function, line 415 - main/server/api/sessions/audio/app.py — _handle_chunk_upload, confirms S3 key is audio.wav - main/server/tests/unit/test_ingest_window_transcribe.py — regression test

Steps to cause failure

flowchart LR
    GlassesAudio([Glasses WAV audio]) -->|POST audio?windowIndex=N| AudioPost[AudioPostFunction]
    AudioPost -->|PutObject audio.wav| S3[(S3)]
    S3 -->|GetObject| IngestWindow[IngestWindowFunction]
    IngestWindow -->|write WAV bytes to .m4a tmp file| Whisper([Groq Whisper API])
    Whisper -->|format mismatch error or empty transcript| Failure([Window processing fails])

System

flowchart TD
    App[Mobile App] -->|WAV chunks via onAudioChunkReady| AudioPost[AudioPostFunction]
    AudioPost -->|sessions/id/window_N/audio.wav| S3
    S3 -->|_load_audio_from_s3| IngestWindow
    IngestWindow -->|_transcribe_audio| NTF[NamedTemporaryFile suffix=.m4a BUG]
    NTF -->|wrong extension| Whisper[Groq Whisper]
    Whisper --> Fail[InvalidRequestError / empty transcript]

Reproduction Details

Start a glasses-mode capture session (captureMode: "audio_video").
Upload audio chunks to POST /sessions/{sessionId}/audio?windowIndex=0 — stored as .wav in S3.
Trigger IngestWindowFunction with frameCount > 0 and matching audio.
Observe: _transcribe_audio writes WAV bytes to a .m4a temp file and passes it to Groq Whisper.
Groq Whisper fails (format mismatch) → window is not processed.

Reproduction test (unit preferred): main/server/tests/unit/test_ingest_window_transcribe.py - TestTranscribeAudioTempFileSuffix::test_uses_wav_suffix — FAILS before fix, PASSES after. - TestTranscribeAudioTempFileSuffix::test_does_not_use_m4a_suffix — FAILS before fix, PASSES after.

Notes for PR

Root cause: a single wrong constant in _transcribe_audio.

The function was written when the only audio format was M4A (full-session legacy uploads). When per-window chunked WAV uploads were added, the constant was never updated.

Fix: change suffix=".m4a" to suffix=".wav" in tempfile.NamedTemporaryFile(...).

The legacy fallback path (sessions/{sessionId}/audio/full.m4a) can still be M4A, so no changes are needed there — the fix only affects the temp file extension used when transcribing, which must match the actual content format.

Audit Log

ID	Action	Note	Context
1	Create audit log	Initialize bug investigation	Glasses video upload fails to process on 2026-04-26
2	Read audio upload lambda	Confirmed S3 key is `audio.wav` for per-window chunks	`api/sessions/audio/app.py` line 57
3	Read ingest_window.py	Found `suffix=".m4a"` in `_transcribe_audio` at line 415	Root cause identified
4	Write failing repro test	`test_uses_wav_suffix` fails with `.m4a` before fix	`tests/unit/test_ingest_window_transcribe.py`
5	Apply fix	Changed `suffix=".m4a"` → `suffix=".wav"`	`ingest_window.py` line 415
6	Confirm tests pass	Both regression tests green	`pytest tests/unit/test_ingest_window_transcribe.py`
7	Run full unit suite	122 tests pass; 16 pre-existing failures (missing numpy/moto, unrelated)	`pytest tests/unit/`

Verification

[x] Reproduced failure before fix
[x] Reproduction test fails before fix
[x] Root cause identified with evidence
[x] Fix applied at source (no workaround-only patch)
[x] Reproduction test passes after fix
[x] Reproduction path now passes
[x] Regression test added/updated
[x] Verified no duplicate solved-bug log exists for same root cause