How Vinted Serves Personalised Search Autocomplete

Vinted's engineering team details how they built a personalised search autocomplete system serving 125 million suggestions across 24 languages at 4,700 QPS with 31ms P99 latency. The system uses a two-phase approach: an offline Self-Learning Suggestions (SLS) pipeline drawing candidates from product metadata and search logs, and an online Vespa-powered serving layer with edge-ngram indexing, fuzzy matching, and a LightGBM Learning-to-Rank model for personalisation. Key technical decisions include implementing edge-ngram tokenisation inside Vespa (dropping P99 from 220ms to 25ms), cascading query relaxation from exact prefix to fuzzy matching, and a 63-feature LTR model optimising NDCG@1. Over 35+ A/B experiments, suggestion usage grew from under 8% to over 20% of search sessions. Notable findings: query-based candidates are only 2% of the pool but drive ~50% of clicks, reducing debounce from 350ms to 100ms boosted usage ~12%, and scoped suggestions increased CTR but hurt conversion.

Apr 28•24m read time•From vinted.engineering

Table of contents

Generating and scoring 125 million suggestions Indexing suggestions into Vespa Matching user input in milliseconds Personalising ranking with Learning-to-Rank High-level architecture Vespa hardware Lessons from 35+ A/B tests Learnings along the way What’s next References

Comment

Bookmark

Copy

Sort: