Wrong LoRA Adapter for VLM2Vec Embedding Model (Issue #403)
Date: 2026-05-07 File: main/server/worldmm/gpu_worker/server.py line 134 Status: Fixed
Symptom
Visual embeddings produced by the GPU worker were incorrect (wrong vector space), causing semantic video search (/encode-video, /encode-text) to return poor or nonsensical results. The mismatch was silent — no exception was raised at load time or inference time.
Root Cause
server.py loaded the PEFT LoRA adapter TIGER-Lab/VLM2Vec-LoRA on top of Qwen/Qwen2-VL-2B-Instruct:
# WRONG — VLM2Vec-LoRA targets Phi-3.5, not Qwen2-VL-2B-Instruct
_embed_model = PeftModel.from_pretrained(base_2b, "TIGER-Lab/VLM2Vec-LoRA")
VLM2Vec-LoRA was designed for the Phi-3.5 base model. Applying it to a Qwen2-VL-2B-Instruct base silently loads the wrong weights into the architecture, producing embeddings in the wrong vector space.
Fix
Replace the adapter name with the Qwen2-VL-compatible variant:
# CORRECT — VLM2Vec-Qwen2VL-2B targets Qwen2-VL-2B-Instruct
_embed_model = PeftModel.from_pretrained(base_2b, "TIGER-Lab/VLM2Vec-Qwen2VL-2B")
Impact
All visual embeddings stored in WorldMMSegment.visual_embedding while the wrong adapter was in use are in the wrong vector space. Visual retrieval results from the memories chat flow (/encode-text → pgvector ANN search) would have been unreliable. Re-ingesting affected sessions is required to regenerate correct embeddings.
See Also
docs/docs/gpu-worker.md— authoritative model/adapter table