Skip to content

Phone-Only Recording

Plan Metadata

  • Plan type: plan
  • Parent plan: N/A
  • Depends on: N/A
  • Status: approved

Status semantics: - draft: Plan is being created or updated and is not final. - approved: Plan is approved but not yet applied in code. - documentation: Code currently exists and matches the plan contract.

Update rule: - When an existing plan is edited, set status to draft until re-approved.

System Intent

  • What is being built: Phone-only audio recording path in capture-session.ts. When the user selects "phone" as their recording device, the phone's built-in microphone captures audio in 30-second windowed WAV chunks (matching the glasses audio-only flow) and uploads them through the existing PersistentUploadQueue and /sessions/{sessionId}/audio?windowIndex=<n> API.
  • Primary consumer(s): startCapture / stopCapture in main/app/lib/capture-session.ts, triggered via CaptureButtonrecording-control-orchestrator.
  • Boundary (black-box scope only): The phone-only recording path is entirely inside capture-session.ts and a new phone-audio-capture.ts helper. No changes to the server API, PersistentUploadQueue, recording-control-orchestrator, or UI components.

Stage Gate Tracker

  • [x] Stage 1 Mermaid approved
  • [x] Stage 2 I/O contracts approved
  • [x] Stage 3 pseudocode/technical details approved or skipped

1. Mermaid Diagram

flowchart TD
  UserRecord([User — Record]) -->|tap record| RecordingOrchestrator["recording-control-orchestrator\nmain/app/lib/recording-control-orchestrator.ts"]:::unchanged
  RecordingOrchestrator -->|startCapture call| CaptureSession["capture-session.ts\nmain/app/lib/capture-session.ts"]:::updated
  CaptureSession -->|startPhoneAudioCapture call| PhoneAudioCapture["phone-audio-capture.ts\nmain/app/lib/phone-audio-capture.ts"]:::created
  PhoneAudioCapture -->|requestPermissionsAsync| ExpoAV["expo-av Audio.Recording\nexternal SDK"]:::unchanged
  ExpoAV -->|30-second WAV chunk URI| PhoneAudioCapture
  PhoneAudioCapture -->|AudioChunkEvent filePath and windowIndex| CaptureSession
  CaptureSession -->|enqueue audio/wav with sessionId and windowIndex| UploadQueue["PersistentUploadQueue\nmain/app/lib/persistent-upload-queue.ts"]:::unchanged
  UploadQueue -->|POST audio/wav windowIndex=n| SessionsAPI["Server API\n/sessions/id/audio"]:::unchanged
  SessionsAPI -->|stored| S3["S3 — external storage"]:::unchanged
  SessionsAPI -->|triggers on window complete| IngestLambda["Ingest Lambda — external"]:::unchanged

  UserStop([User — Stop]) -->|tap stop| RecordingOrchestrator
  RecordingOrchestrator -->|stopCapture call| CaptureSession
  CaptureSession -->|stopPhoneAudioCapture call| PhoneAudioCapture
  PhoneAudioCapture -->|final partial WAV chunk via callback| CaptureSession
  CaptureSession -->|flush queue then POST end| SessionsAPI

classDef unchanged fill:#d3d3d3,stroke:#666,stroke-width:1px;
classDef updated fill:#ffe58a,stroke:#666,stroke-width:1px;
classDef created fill:#a8e6a3,stroke:#666,stroke-width:1px;

2. Black-Box Inputs and Outputs

Global Types

RecordingDevice {
  value: "glasses" | "glasses-audio" | "phone"
}

AudioChunkEvent {
  filePath: string   (absolute file:// URI to WAV file on device)
  windowIndex: number (0-based sequential chunk index)
  durationMs: number  (actual duration of this chunk)
  sizeBytes: number   (file size in bytes)
}

UploadItem {
  id: string
  type: "audio"
  uri: string          (file:// URI)
  sessionId: string
  windowIndex: number
  retryCount: number
  sizeBytes: number
}

Flow: startPhoneAudioCapture

  • Test files: main/app/__tests__/phone-audio-capture.test.ts
  • Core files: main/app/lib/phone-audio-capture.ts

Type Definitions

StartPhoneAudioCaptureInput {
  onChunkReady: (event: AudioChunkEvent) => void   (callback invoked per 30-second WAV chunk)
}

StartPhoneAudioCaptureOutput {
  void   (resolves when recording starts; rejects on permission denied or hardware error)
}

Paths

path-name input output/expected state change path-type notes updated
startPhoneAudioCapture.success StartPhoneAudioCaptureInput void; expo-av recording active, 30-second chunking timer running happy path requests microphone permission; creates expo-av Audio.Recording with WAV preset; starts 30-second interval to finalize and restart recording Y
startPhoneAudioCapture.permission-denied StartPhoneAudioCaptureInput throws Error("Microphone permission denied") error no recording started; caller (startCapture) catches and calls cleanupSession Y
startPhoneAudioCapture.already-active StartPhoneAudioCaptureInput throws Error("Phone audio capture already active") error guard against double-start Y

Flow: stopPhoneAudioCapture

  • Test files: main/app/__tests__/phone-audio-capture.test.ts
  • Core files: main/app/lib/phone-audio-capture.ts

Type Definitions

StopPhoneAudioCaptureInput {
  void
}

StopPhoneAudioCaptureOutput {
  void   (resolves after final partial chunk is finalized and onChunkReady fired)
}

Paths

path-name input output/expected state change path-type notes updated
stopPhoneAudioCapture.success void void; chunk interval cleared; current recording stopped; final partial WAV chunk delivered via onChunkReady happy path partial chunk (< 30s) is still delivered so no audio is lost at session end Y
stopPhoneAudioCapture.not-active void void (no-op) subpath idempotent; safe to call when not recording Y

Flow: startCapture (phone-mode branch)

  • Test files: main/app/__tests__/capture-session.test.ts
  • Core files: main/app/lib/capture-session.ts

Type Definitions

StartCaptureInput {
  selectedDevice: RecordingDevice   (read from recording-device-preference)
  sdkAvailable: boolean             (WearablesModule.isAnyDeviceAvailable())
}

StartCaptureOutput {
  void   (session created, phone audio capture running, upload queue active)
}

Paths

path-name input output/expected state change path-type notes updated
startCapture.phone.success selectedDevice=phone void; server session created with captureMode=audio_only; PersistentUploadQueue initialized; phone audio capture active happy path mirrors glasses-audio flow: same captureMode, same upload queue, same session lifecycle Y
startCapture.phone.permission-denied selectedDevice=phone throws; server session cleaned up via cleanupSession() error session is opened before capture starts (same as glasses flow); must be ended on failure Y

Flow: stopCapture (phone-mode branch)

  • Test files: main/app/__tests__/capture-session.test.ts
  • Core files: main/app/lib/capture-session.ts

Type Definitions

StopCaptureInput {
  activeRecordingDevice: "phone"   (captured at startCapture time)
}

StopCaptureOutput {
  void   (phone audio stopped, final chunk enqueued, queue flushed, session ended)
}

Paths

path-name input output/expected state change path-type notes updated
stopCapture.phone.success activeRecordingDevice=phone void; phone audio stopped; final chunk enqueued; upload queue flushed; /sessions/{id}/end called happy path mirrors glasses-audio teardown path: stop audio → flush queue → end session Y
stopCapture.phone.stop-error activeRecordingDevice=phone + stopPhoneAudioCapture throws partial stop; error logged; queue still flushed; session still ended error same resilience pattern as glasses teardown — errors are caught and logged, not re-thrown before session end Y

3. Pseudocode / Technical Details for Critical Flows (Optional)

New file: main/app/lib/phone-audio-capture.ts

import { Audio } from "expo-av";

CHUNK_DURATION_MS = 30_000

let activeRecording: Audio.Recording | null = null
let chunkTimer: NodeJS.Timeout | null = null
let windowIndex = 0
let chunkCallback: ((event: AudioChunkEvent) => void) | null = null

async function finalizeChunk(recording: Audio.Recording): Promise<void>
  await recording.stopAndUnloadAsync()
  const uri = recording.getURI()             // file:// URI on-device WAV
  if uri:
    chunkCallback?.({ filePath: uri, windowIndex, durationMs: CHUNK_DURATION_MS, sizeBytes: await getFileSize(uri) })
    windowIndex++

async function startNewChunk(): Promise<void>
  const recording = new Audio.Recording()
  await recording.prepareToRecordAsync(RECORDING_OPTIONS_PRESET_HIGH_QUALITY)
  await recording.startAsync()
  activeRecording = recording

export async function startPhoneAudioCapture(onChunkReady): Promise<void>
  if activeRecording !== null: throw new Error("Phone audio capture already active")
  const { granted } = await Audio.requestPermissionsAsync()
  if !granted: throw new Error("Microphone permission denied")
  await Audio.setAudioModeAsync({ allowsRecordingIOS: true, playsInSilentModeIOS: true })
  windowIndex = 0
  chunkCallback = onChunkReady
  await startNewChunk()
  chunkTimer = setInterval(async () =>
    const currentRecording = activeRecording
    await startNewChunk()            // start next BEFORE stopping current (overlap-safe)
    if currentRecording: await finalizeChunk(currentRecording)
  , CHUNK_DURATION_MS)

export async function stopPhoneAudioCapture(): Promise<void>
  if !activeRecording: return
  clearInterval(chunkTimer); chunkTimer = null
  const last = activeRecording
  activeRecording = null
  chunkCallback = null
  await finalizeChunk(last)         // flush partial final chunk

Changes to main/app/lib/capture-session.ts

// In startCapture():
if useWearablesCapture:
  // existing glasses path unchanged
else:
  // PHONE PATH (was empty stub)
  logger({ step: "capture_phone_mode_starting" })
  startAudioChunkListener()   // reuse same callback → enqueues to uploadQueue
  // But instead of WearablesModule, use phone-audio-capture:
  await startPhoneAudioCapture((event) =>
    uploadQueue?.enqueue("audio", toFileUri(event.filePath), {
      windowIndex: event.windowIndex,
      sizeBytes: event.sizeBytes,
    })
  )
  logger({ step: "capture_phone_mode_started" })

// In stopCapture():
if startedWithPhone:         // activeRecordingDevice === "phone"
  try:
    await stopPhoneAudioCapture()  // flushes last chunk via callback
    logger({ step: "capture_phone_audio_stopped" })
  catch e:
    logger({ step: "capture_phone_audio_stop_error", additional: { error: e.message } })
  // flush queue (same as glasses-audio path)
  if uploadQueue:
    await uploadQueue.flush()
  // then fall through to /sessions/{id}/end (existing code)

captureMode sent to server

Phone mode sends captureMode: "audio_only" (same as glasses-audio), since there are no frames. The existing code already sends audio_video as a fallback for phone; this will change to audio_only.

expo-av dependency

expo-av must be added to main/app/package.json. It is a first-party Expo package (no native linking ceremony beyond npx expo install). The Audio.Recording API is available on iOS and Android.

Audio recording options

Use Audio.RecordingOptionsPresets.HIGH_QUALITY which records 16-bit 44.1kHz mono WAV — compatible with the existing /sessions/{id}/audio endpoint which accepts audio/wav.

Implementation notes

  • activeRecordingDevice is captured at startCapture time and used in stopCapture — ensures the correct teardown path runs even if the user changes their preference mid-session. This is consistent with the existing glasses teardown guard.
  • stopPhoneAudioCapture must finalize the last partial chunk before resolving so the queue can flush it. This mirrors WearablesModule.stopAudioCapture() which fires one last onAudioChunkReady event for the partial buffer before resolving.
  • No frames are captured in phone mode. startFramePolling / stopFramesTeardown are not called.
  • The cleanupSession function (used on error) currently calls WearablesModule.stopAudioCapture() and WearablesModule.stopStreamSession(). It must also call stopPhoneAudioCapture() when the device was phone to avoid a leak.

After all stages are approved, apply .agent/skills/reconcile-plans/SKILL.md to propagate contract updates across linked plans.