Skip to content

System Architecture

System Overview

1. Meta Ray-Ban Glasses Integration (main/app/services/meta-sdk-wrapper.ts)

Responsibilities: Streaming capture and manifest generation

  • Meta SDK Integration:
  • Uses Meta Ray-Ban SDK for iOS/Android to stream audio and video
  • Built-in streaming support with real-time data capture
  • Virtual glasses available for development
  • User-Triggered Capture:
  • User manually triggers capture via "Hey Meta, remember this" or button press
  • No automatic salience detection (battery optimization for v1)
  • User-defined memories provide training data for future ML models
  • Manifest Generation:
  • Generates event IDs: {device_id}-{boot_session_id}-{ulid}
  • Computes SHA256 hashes for captured files
  • Creates Event Manifest JSON (v2.1.0) with timestamps, files, privacy metadata
  • Includes preroll/postroll padding for context

2. Phone App (main/app/)

Responsibilities: Event aggregation and cloud sync

  • Built with Expo (React Native) for iOS/Android
  • Meta SDK Wrapper (services/meta-sdk-wrapper.ts): TypeScript integration with Meta SDK
  • Handles streaming audio/video from glasses
  • Generates event manifests from captured data
  • Manages stream start/stop and file capture
  • EventBuilder (services/event-builder.ts): State machine that merges overlapping events, deduplicates blobs by SHA256, handles preroll/postroll overlaps
  • State: IDLE → RECEIVING → MERGING → EMITTING → IDLE
  • Simplified merge logic for user-triggered captures
  • UploadQueue (services/upload-queue.ts): SQLite-backed persistent queue with exponential backoff retry
  • State: PENDING → UPLOADING → UPLOADED (with retry on FAILED)
  • Features: Max 5 retries, 3 concurrent uploads, supports streaming uploads
  • Uploads happen in real-time as data streams from glasses
  • Database (services/database.ts): SQLite tables for queue, manifest cache, blob tracking
  • Uploads blobs to S3 and sends manifests to AWS API Gateway

3. AWS Backend (main/server/, main/devops/)

Responsibilities: Processing, storage, and semantic indexing

  • S3 Storage: Hashed keys for deduplication and versioning
  • Organized by GUID: guid/audio/counter.mp3, guid/video/counter.mp4, guid/photo/counter.jpg, guid/transcript/counter.txt
  • Lambda Functions:
  • Ingest Manifest (ingest_manifest/): Validates manifests, checks duplicates via S3 idempotency keys, enqueues to SQS
  • Process ASR (process_asr/): Transcribes audio using Whisper, enqueues to embedding queue
  • Process Embedding (process_embedding/): Generates 384-dim embeddings (MiniLM), stores in PostgreSQL + pgvector
  • SQS Queues: ingestion-queue → ProcessASR → embedding-queue → ProcessEmbedding (with processing-dlq for failures)
  • Processing Pipeline:
  • Automatic Speech Recognition (ASR) via Whisper
  • Speaker diarization (planned)
  • Entity extraction (people, places, objects, commitments) (planned)
  • Embedding generation from transcripts using sentence-transformers/all-MiniLM-L6-v2
  • PostgreSQL + pgvector: Unified database for metadata and vector storage
  • Metadata: Pointers to S3 blobs, timestamps, location hash, privacy flags, relationships
  • Vectors: Embeddings stored via pgvector extension for semantic similarity search
  • Benefits: Single database, native SQL queries, fast vector search, simpler operations than separate systems
  • Normalized Tables: Track canonical entities, relationships, and commitments (future: Neo4j option)

Data Flow

graph TD
    Glasses[Meta Ray-Ban Glasses] -->|Streaming A/V| Phone[Phone App]
    Phone -->|Meta SDK Wrapper| Manifest[Event Manifest]
    Phone -->|EventBuilder| Merged[Merged Events]
    Phone -->|UploadQueue| S3[S3 Storage]
    Phone -->|Manifest API| AWS[AWS Backend]

    AWS -->|Ingest Lambda| SQS1[Ingestion Queue]
    SQS1 -->|Process ASR Lambda| SQS2[Embedding Queue]
    SQS2 -->|Process Embedding Lambda| DB[PostgreSQL + pgvector]

    subgraph Storage
    S3
    DB
    end

Database Architecture

PostgreSQL + pgvector serves dual purpose: 1. Metadata storage: Event records, S3 pointers, timestamps, privacy flags, relationships 2. Vector storage: Embeddings for semantic search (via pgvector extension)

Why this approach? - Single database simplifies operations (no separate vector DB) - Native SQL for complex queries and relationships - pgvector provides fast vector similarity search - Easier to maintain and scale than separate systems - Design doc mentions LanceDB as alternative, but PostgreSQL + pgvector chosen for MVP

Future considerations: - Topics (mutable data): May use Elasticsearch with BM25 for faster updates - Memories (immutable): Continue using pgvector (updates are rare, appends are fast)

Event Manifest Schema

Event Manifests are JSON documents (schema version 2.1.0) that describe captured moments. The full JSON Schema is defined in schemas/manifest/v2.1.0/event_manifest.schema.json.

Key Fields: - event_id: Unique identifier ({device_id}-{boot_session_id}-{counter}) - device_id: Device identifier - device_time_unix_ms: Device clock timestamp (may have skew) - t_start, t_end: Event time range (device clock, unix ms) - preroll_ms, postroll_ms: Buffer padding duration - files: Array of media files with SHA256 hashes for content-addressed storage - signals: ML model outputs (VAD segments, scene cuts, salience score, etc.) - privacy: PII level, retention class, redaction flags

Manifest JSON Example

{
  "manifest_version": "2.1.0",
  "event_id": "device1-session1-000042",
  "device_id": "device1",
  "device_time_unix_ms": 1704067200000,
  "t_start": 1704067200000,
  "t_end": 1704067210000,
  "preroll_ms": 2000,
  "postroll_ms": 1000,
  "files": [
    {
      "type": "audio",
      "sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
      "size": 10000,
      "local_path": "/tmp/audio.wav"
    }
  ],
  "signals": {
    "salience_score": 0.85,
    "vad_segments": [{"start_ms": 0, "end_ms": 5000, "confidence": 0.9}],
    "user_marked_important": false
  },
  "privacy": {
    "pii_level": "low",
    "retention_class": "personal",
    "redaction_flags": []
  }
}

See schemas/manifest/v2.1.0/README.md for full schema documentation and validation examples.

Memory Card Schema

Post-processing creates memory cards with: - Summary: Natural language summary of the moment - Highlights: Key frames and audio clips - Entities: People, places, objects involved - Commitments: Extracted todos, decisions, action items - Embedding: Vector representation for semantic search - Metadata: Timestamps, location hash, privacy info, S3 blob pointers