Skip to content

Glasses Video Processing: WAV Audio Written to .m4a Temp File

Metadata

  • Date: 2026-04-26
  • Status: fixed
  • Severity: high
  • Related issue/ticket: N/A
  • Owner: N/A

About

Overview: - Glasses-mode (captureMode: "audio_video") video upload sessions fail silently at the transcription step inside IngestWindowFunction. - The mobile app uploads 30-second WAV audio chunks to S3 at sessions/{sessionId}/window_{N:03d}/audio.wav. - When IngestWindowFunction downloads that audio and passes it to Groq Whisper for transcription, it writes the WAV bytes to a tempfile.NamedTemporaryFile with suffix=".m4a". - Groq Whisper detects the file format by extension. Providing a .m4a file name containing raw WAV data causes a format mismatch: the API raises an InvalidRequestError or returns an empty transcript, breaking the entire window-processing pipeline.

Technical Questions: - Are we making assumptions? No — the suffix .m4a in the source code is an explicit, clearly wrong constant. - How old is this bug? Introduced when the per-window audio chunk upload flow was added (chunked WAV path). The function was originally written for legacy full-session M4A uploads. - Is there anything obvious we might have missed? Yes — the legacy S3 fallback path (sessions/{sessionId}/audio/full.m4a) is M4A audio, but the per-window path is always WAV. The suffix was never updated when the new upload path was introduced. - Are there specific system states required to reproduce it? Glasses mode (captureMode: "audio_video") with audio chunks uploaded via the new per-window path.

Resources: - main/server/worldmm/pipeline/ingest_window.py_transcribe_audio function, line 415 - main/server/api/sessions/audio/app.py_handle_chunk_upload, confirms S3 key is audio.wav - main/server/tests/unit/test_ingest_window_transcribe.py — regression test

Steps to cause failure

flowchart LR
    GlassesAudio([Glasses WAV audio]) -->|POST audio?windowIndex=N| AudioPost[AudioPostFunction]
    AudioPost -->|PutObject audio.wav| S3[(S3)]
    S3 -->|GetObject| IngestWindow[IngestWindowFunction]
    IngestWindow -->|write WAV bytes to .m4a tmp file| Whisper([Groq Whisper API])
    Whisper -->|format mismatch error or empty transcript| Failure([Window processing fails])

System

flowchart TD
    App[Mobile App] -->|WAV chunks via onAudioChunkReady| AudioPost[AudioPostFunction]
    AudioPost -->|sessions/id/window_N/audio.wav| S3
    S3 -->|_load_audio_from_s3| IngestWindow
    IngestWindow -->|_transcribe_audio| NTF[NamedTemporaryFile suffix=.m4a BUG]
    NTF -->|wrong extension| Whisper[Groq Whisper]
    Whisper --> Fail[InvalidRequestError / empty transcript]

Reproduction Details

  1. Start a glasses-mode capture session (captureMode: "audio_video").
  2. Upload audio chunks to POST /sessions/{sessionId}/audio?windowIndex=0 — stored as .wav in S3.
  3. Trigger IngestWindowFunction with frameCount > 0 and matching audio.
  4. Observe: _transcribe_audio writes WAV bytes to a .m4a temp file and passes it to Groq Whisper.
  5. Groq Whisper fails (format mismatch) → window is not processed.

Reproduction test (unit preferred): main/server/tests/unit/test_ingest_window_transcribe.py - TestTranscribeAudioTempFileSuffix::test_uses_wav_suffix — FAILS before fix, PASSES after. - TestTranscribeAudioTempFileSuffix::test_does_not_use_m4a_suffix — FAILS before fix, PASSES after.

Notes for PR

Root cause: a single wrong constant in _transcribe_audio.

The function was written when the only audio format was M4A (full-session legacy uploads). When per-window chunked WAV uploads were added, the constant was never updated.

Fix: change suffix=".m4a" to suffix=".wav" in tempfile.NamedTemporaryFile(...).

The legacy fallback path (sessions/{sessionId}/audio/full.m4a) can still be M4A, so no changes are needed there — the fix only affects the temp file extension used when transcribing, which must match the actual content format.

Audit Log

ID Action Note Context
1 Create audit log Initialize bug investigation Glasses video upload fails to process on 2026-04-26
2 Read audio upload lambda Confirmed S3 key is audio.wav for per-window chunks api/sessions/audio/app.py line 57
3 Read ingest_window.py Found suffix=".m4a" in _transcribe_audio at line 415 Root cause identified
4 Write failing repro test test_uses_wav_suffix fails with .m4a before fix tests/unit/test_ingest_window_transcribe.py
5 Apply fix Changed suffix=".m4a"suffix=".wav" ingest_window.py line 415
6 Confirm tests pass Both regression tests green pytest tests/unit/test_ingest_window_transcribe.py
7 Run full unit suite 122 tests pass; 16 pre-existing failures (missing numpy/moto, unrelated) pytest tests/unit/

Verification

  • [x] Reproduced failure before fix
  • [x] Reproduction test fails before fix
  • [x] Root cause identified with evidence
  • [x] Fix applied at source (no workaround-only patch)
  • [x] Reproduction test passes after fix
  • [x] Reproduction path now passes
  • [x] Regression test added/updated
  • [x] Verified no duplicate solved-bug log exists for same root cause