Vinted implemented dense retrieval using embedding-based search to solve low-recall problems in their multilingual e-commerce platform. The solution uses a Two-Tower Model with CLIP embeddings, processing over 1 billion items through Vespa's HNSW index. Key challenges included managing latency with filtered ANN searches, ensuring result consistency across different query types, and limiting nearest neighbor matches to maintain user experience. Performance optimizations included splitting indices per market bloc, implementing retry strategies with exact fallback searches, and using GraalVM with ZGC for improved tail latencies. The team developed creative workarounds using Vespa's global-phase ranking and reciprocal rank fusion to control result composition while maintaining sub-500ms latency budgets.
Sort: