Wrong LoRA Adapter for VLM2Vec Embedding Model (Issue #403)

Date: 2026-05-07 File: main/server/worldmm/gpu_worker/server.py line 134 Status: Fixed

Symptom

Visual embeddings produced by the GPU worker were incorrect (wrong vector space), causing semantic video search (/encode-video, /encode-text) to return poor or nonsensical results. The mismatch was silent — no exception was raised at load time or inference time.

Root Cause

server.py loaded the PEFT LoRA adapter TIGER-Lab/VLM2Vec-LoRA on top of Qwen/Qwen2-VL-2B-Instruct:

# WRONG — VLM2Vec-LoRA targets Phi-3.5, not Qwen2-VL-2B-Instruct
_embed_model = PeftModel.from_pretrained(base_2b, "TIGER-Lab/VLM2Vec-LoRA")

VLM2Vec-LoRA was designed for the Phi-3.5 base model. Applying it to a Qwen2-VL-2B-Instruct base silently loads the wrong weights into the architecture, producing embeddings in the wrong vector space.

Fix

Replace the adapter name with the Qwen2-VL-compatible variant:

# CORRECT — VLM2Vec-Qwen2VL-2B targets Qwen2-VL-2B-Instruct
_embed_model = PeftModel.from_pretrained(base_2b, "TIGER-Lab/VLM2Vec-Qwen2VL-2B")

Impact

All visual embeddings stored in WorldMMSegment.visual_embedding while the wrong adapter was in use are in the wrong vector space. Visual retrieval results from the memories chat flow (/encode-text → pgvector ANN search) would have been unreliable. Re-ingesting affected sessions is required to regenerate correct embeddings.

Wrong LoRA Adapter for VLM2Vec Embedding Model (Issue #403)

Symptom

Root Cause

Fix

Impact

See Also