System Architecture

System Overview

1. Meta Ray-Ban Glasses Integration (`main/app/services/meta-sdk-wrapper.ts`)

Responsibilities: Streaming capture and manifest generation

Meta SDK Integration:
Uses Meta Ray-Ban SDK for iOS/Android to stream audio and video
Built-in streaming support with real-time data capture
Virtual glasses available for development
User-Triggered Capture:
User manually triggers capture via "Hey Meta, remember this" or button press
No automatic salience detection (battery optimization for v1)
User-defined memories provide training data for future ML models
Manifest Generation:
Generates event IDs: {device_id}-{boot_session_id}-{ulid}
Computes SHA256 hashes for captured files
Creates Event Manifest JSON (v2.1.0) with timestamps, files, privacy metadata
Includes preroll/postroll padding for context

2. Phone App (`main/app/`)

Responsibilities: Event aggregation and cloud sync

Built with Expo (React Native) for iOS/Android
Meta SDK Wrapper (services/meta-sdk-wrapper.ts): TypeScript integration with Meta SDK
Handles streaming audio/video from glasses
Generates event manifests from captured data
Manages stream start/stop and file capture
EventBuilder (services/event-builder.ts): State machine that merges overlapping events, deduplicates blobs by SHA256, handles preroll/postroll overlaps
State: IDLE → RECEIVING → MERGING → EMITTING → IDLE
Simplified merge logic for user-triggered captures
UploadQueue (services/upload-queue.ts): SQLite-backed persistent queue with exponential backoff retry
State: PENDING → UPLOADING → UPLOADED (with retry on FAILED)
Features: Max 5 retries, 3 concurrent uploads, supports streaming uploads
Uploads happen in real-time as data streams from glasses
Database (services/database.ts): SQLite tables for queue, manifest cache, blob tracking
Uploads blobs to S3 and sends manifests to AWS API Gateway

3. AWS Backend (`main/server/`, `main/devops/`)

Responsibilities: Processing, storage, and semantic indexing

S3 Storage: Hashed keys for deduplication and versioning
Organized by GUID: guid/audio/counter.mp3, guid/video/counter.mp4, guid/photo/counter.jpg, guid/transcript/counter.txt
Lambda Functions:
Ingest Manifest (ingest_manifest/): Validates manifests, checks duplicates via S3 idempotency keys, enqueues to SQS
Process ASR (process_asr/): Transcribes audio using Whisper, enqueues to embedding queue
Process Embedding (process_embedding/): Generates 384-dim embeddings (MiniLM), stores in PostgreSQL + pgvector
SQS Queues: ingestion-queue → ProcessASR → embedding-queue → ProcessEmbedding (with processing-dlq for failures)
Processing Pipeline:
Automatic Speech Recognition (ASR) via Whisper
Speaker diarization (planned)
Entity extraction (people, places, objects, commitments) (planned)
Embedding generation from transcripts using sentence-transformers/all-MiniLM-L6-v2
PostgreSQL + pgvector: Unified database for metadata and vector storage
Metadata: Pointers to S3 blobs, timestamps, location hash, privacy flags, relationships
Vectors: Embeddings stored via pgvector extension for semantic similarity search
Benefits: Single database, native SQL queries, fast vector search, simpler operations than separate systems
Normalized Tables: Track canonical entities, relationships, and commitments (future: Neo4j option)

Data Flow

graph TD
    Glasses[Meta Ray-Ban Glasses] -->|Streaming A/V| Phone[Phone App]
    Phone -->|Meta SDK Wrapper| Manifest[Event Manifest]
    Phone -->|EventBuilder| Merged[Merged Events]
    Phone -->|UploadQueue| S3[S3 Storage]
    Phone -->|Manifest API| AWS[AWS Backend]

    AWS -->|Ingest Lambda| SQS1[Ingestion Queue]
    SQS1 -->|Process ASR Lambda| SQS2[Embedding Queue]
    SQS2 -->|Process Embedding Lambda| DB[PostgreSQL + pgvector]

    subgraph Storage
    S3
    DB
    end

Database Architecture

PostgreSQL + pgvector serves dual purpose: 1. Metadata storage: Event records, S3 pointers, timestamps, privacy flags, relationships 2. Vector storage: Embeddings for semantic search (via pgvector extension)

Why this approach? - Single database simplifies operations (no separate vector DB) - Native SQL for complex queries and relationships - pgvector provides fast vector similarity search - Easier to maintain and scale than separate systems - Design doc mentions LanceDB as alternative, but PostgreSQL + pgvector chosen for MVP

Future considerations: - Topics (mutable data): May use Elasticsearch with BM25 for faster updates - Memories (immutable): Continue using pgvector (updates are rare, appends are fast)

Event Manifest Schema

Event Manifests are JSON documents (schema version 2.1.0) that describe captured moments. The full JSON Schema is defined in schemas/manifest/v2.1.0/event_manifest.schema.json.

Key Fields: - event_id: Unique identifier ({device_id}-{boot_session_id}-{counter}) - device_id: Device identifier - device_time_unix_ms: Device clock timestamp (may have skew) - t_start, t_end: Event time range (device clock, unix ms) - preroll_ms, postroll_ms: Buffer padding duration - files: Array of media files with SHA256 hashes for content-addressed storage - signals: ML model outputs (VAD segments, scene cuts, salience score, etc.) - privacy: PII level, retention class, redaction flags

Manifest JSON Example

{
  "manifest_version": "2.1.0",
  "event_id": "device1-session1-000042",
  "device_id": "device1",
  "device_time_unix_ms": 1704067200000,
  "t_start": 1704067200000,
  "t_end": 1704067210000,
  "preroll_ms": 2000,
  "postroll_ms": 1000,
  "files": [
    {
      "type": "audio",
      "sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
      "size": 10000,
      "local_path": "/tmp/audio.wav"
    }
  ],
  "signals": {
    "salience_score": 0.85,
    "vad_segments": [{"start_ms": 0, "end_ms": 5000, "confidence": 0.9}],
    "user_marked_important": false
  },
  "privacy": {
    "pii_level": "low",
    "retention_class": "personal",
    "redaction_flags": []
  }
}

See schemas/manifest/v2.1.0/README.md for full schema documentation and validation examples.

Memory Card Schema

Post-processing creates memory cards with: - Summary: Natural language summary of the moment - Highlights: Key frames and audio clips - Entities: People, places, objects involved - Commitments: Extracted todos, decisions, action items - Embedding: Vector representation for semantic search - Metadata: Timestamps, location hash, privacy info, S3 blob pointers