Skip to content

Ingest Window: Naive Datetime in _get_session_created_at + Retry Timestamp Drift

Metadata

  • Date: 2026-04-23
  • Status: fixed
  • Severity: major
  • Related issue/ticket: N/A
  • Owner: N/A

About

Overview:

Two related bugs in main/server/worldmm/pipeline/ingest_window.py:

  1. _get_session_created_at returns a timezone-naive datetime when DynamoDB stores a value without a timezone suffix (e.g. "2026-04-23T10:00:00" instead of "2026-04-23T10:00:00+00:00"). datetime.fromisoformat does not attach a timezone in that case. When .timestamp() is then called on a naive datetime, Python treats it as local time, not UTC, shifting window_start_s by the server's UTC offset and producing wrong start_iso/end_iso for the segment and visual embedding.

  2. On a retry delivery (existing is not None and processing_status != "complete"), start_iso/end_iso are recomputed from _get_session_created_at before the idempotency branch. If the DynamoDB lookup fails and falls back to now(), the recomputed timestamps diverge from the already-persisted existing.start_time/existing.end_time. store_visual_embedding receives a stale timestamp, breaking time-based retrieval for the retry'd segment.

Technical Questions: - Why does fromisoformat return a naive datetime? DynamoDB stores the value as written by the client; if the client omits the timezone designator the returned string is naive. - Why is the retry path affected? Lines 153-156 run unconditionally before the if existing: branch, so both new and retry paths share the same (possibly-wrong) timestamp calculation.

Reproduction Test

main/server/tests/unit/test_ingest_window_datetime.py — added as part of fix.

Root Cause

  1. Naive datetime: datetime.fromisoformat(raw) does not normalise to UTC when the raw string lacks a +00:00 suffix. Fix: call .replace(tzinfo=timezone.utc) when the result is naive.
  2. Retry drift: start_iso/end_iso must come from the existing DB row on retry, not be recomputed. Fix: after the idempotency check, override start_iso/end_iso from existing.start_time/existing.end_time when existing is set.

Fix Summary

  • _get_session_created_at: normalise naive result to UTC via .replace(tzinfo=timezone.utc), add a logger warning on fallback.
  • lambda_handler: move start_iso/end_iso derivation so the retry branch reads from existing rather than recomputing.

Verification

  • New unit tests pass: test_ingest_window_datetime.py.
  • Existing tests unchanged.