Glasses Video Processing: WAV Audio Written to .m4a Temp File
Metadata
- Date:
2026-04-26 - Status:
fixed - Severity:
high - Related issue/ticket:
N/A - Owner:
N/A
About
Overview: - Glasses-mode (captureMode: "audio_video") video upload sessions fail silently at the transcription step inside IngestWindowFunction. - The mobile app uploads 30-second WAV audio chunks to S3 at sessions/{sessionId}/window_{N:03d}/audio.wav. - When IngestWindowFunction downloads that audio and passes it to Groq Whisper for transcription, it writes the WAV bytes to a tempfile.NamedTemporaryFile with suffix=".m4a". - Groq Whisper detects the file format by extension. Providing a .m4a file name containing raw WAV data causes a format mismatch: the API raises an InvalidRequestError or returns an empty transcript, breaking the entire window-processing pipeline.
Technical Questions: - Are we making assumptions? No — the suffix .m4a in the source code is an explicit, clearly wrong constant. - How old is this bug? Introduced when the per-window audio chunk upload flow was added (chunked WAV path). The function was originally written for legacy full-session M4A uploads. - Is there anything obvious we might have missed? Yes — the legacy S3 fallback path (sessions/{sessionId}/audio/full.m4a) is M4A audio, but the per-window path is always WAV. The suffix was never updated when the new upload path was introduced. - Are there specific system states required to reproduce it? Glasses mode (captureMode: "audio_video") with audio chunks uploaded via the new per-window path.
Resources: - main/server/worldmm/pipeline/ingest_window.py — _transcribe_audio function, line 415 - main/server/api/sessions/audio/app.py — _handle_chunk_upload, confirms S3 key is audio.wav - main/server/tests/unit/test_ingest_window_transcribe.py — regression test
Steps to cause failure
flowchart LR
GlassesAudio([Glasses WAV audio]) -->|POST audio?windowIndex=N| AudioPost[AudioPostFunction]
AudioPost -->|PutObject audio.wav| S3[(S3)]
S3 -->|GetObject| IngestWindow[IngestWindowFunction]
IngestWindow -->|write WAV bytes to .m4a tmp file| Whisper([Groq Whisper API])
Whisper -->|format mismatch error or empty transcript| Failure([Window processing fails]) System
flowchart TD
App[Mobile App] -->|WAV chunks via onAudioChunkReady| AudioPost[AudioPostFunction]
AudioPost -->|sessions/id/window_N/audio.wav| S3
S3 -->|_load_audio_from_s3| IngestWindow
IngestWindow -->|_transcribe_audio| NTF[NamedTemporaryFile suffix=.m4a BUG]
NTF -->|wrong extension| Whisper[Groq Whisper]
Whisper --> Fail[InvalidRequestError / empty transcript] Reproduction Details
- Start a glasses-mode capture session (
captureMode: "audio_video"). - Upload audio chunks to
POST /sessions/{sessionId}/audio?windowIndex=0— stored as.wavin S3. - Trigger
IngestWindowFunctionwithframeCount > 0and matching audio. - Observe:
_transcribe_audiowrites WAV bytes to a.m4atemp file and passes it to Groq Whisper. - Groq Whisper fails (format mismatch) → window is not processed.
Reproduction test (unit preferred): main/server/tests/unit/test_ingest_window_transcribe.py - TestTranscribeAudioTempFileSuffix::test_uses_wav_suffix — FAILS before fix, PASSES after. - TestTranscribeAudioTempFileSuffix::test_does_not_use_m4a_suffix — FAILS before fix, PASSES after.
Notes for PR
Root cause: a single wrong constant in _transcribe_audio.
The function was written when the only audio format was M4A (full-session legacy uploads). When per-window chunked WAV uploads were added, the constant was never updated.
Fix: change suffix=".m4a" to suffix=".wav" in tempfile.NamedTemporaryFile(...).
The legacy fallback path (sessions/{sessionId}/audio/full.m4a) can still be M4A, so no changes are needed there — the fix only affects the temp file extension used when transcribing, which must match the actual content format.
Audit Log
| ID | Action | Note | Context |
|---|---|---|---|
| 1 | Create audit log | Initialize bug investigation | Glasses video upload fails to process on 2026-04-26 |
| 2 | Read audio upload lambda | Confirmed S3 key is audio.wav for per-window chunks | api/sessions/audio/app.py line 57 |
| 3 | Read ingest_window.py | Found suffix=".m4a" in _transcribe_audio at line 415 | Root cause identified |
| 4 | Write failing repro test | test_uses_wav_suffix fails with .m4a before fix | tests/unit/test_ingest_window_transcribe.py |
| 5 | Apply fix | Changed suffix=".m4a" → suffix=".wav" | ingest_window.py line 415 |
| 6 | Confirm tests pass | Both regression tests green | pytest tests/unit/test_ingest_window_transcribe.py |
| 7 | Run full unit suite | 122 tests pass; 16 pre-existing failures (missing numpy/moto, unrelated) | pytest tests/unit/ |
Verification
- [x] Reproduced failure before fix
- [x] Reproduction test fails before fix
- [x] Root cause identified with evidence
- [x] Fix applied at source (no workaround-only patch)
- [x] Reproduction test passes after fix
- [x] Reproduction path now passes
- [x] Regression test added/updated
- [x] Verified no duplicate solved-bug log exists for same root cause