Vinted's engineering team details how they built a personalised search autocomplete system serving 125 million suggestions across 24 languages at 4,700 QPS with 31ms P99 latency. The system uses a two-phase approach: an offline Self-Learning Suggestions (SLS) pipeline drawing candidates from product metadata and search logs, and an online Vespa-powered serving layer with edge-ngram indexing, fuzzy matching, and a LightGBM Learning-to-Rank model for personalisation. Key technical decisions include implementing edge-ngram tokenisation inside Vespa (dropping P99 from 220ms to 25ms), cascading query relaxation from exact prefix to fuzzy matching, and a 63-feature LTR model optimising NDCG@1. Over 35+ A/B experiments, suggestion usage grew from under 8% to over 20% of search sessions. Notable findings: query-based candidates are only 2% of the pool but drive ~50% of clicks, reducing debounce from 350ms to 100ms boosted usage ~12%, and scoped suggestions increased CTR but hurt conversion.
Table of contents
Generating and scoring 125 million suggestionsIndexing suggestions into VespaMatching user input in millisecondsPersonalising ranking with Learning-to-RankHigh-level architectureVespa hardwareLessons from 35+ A/B testsLearnings along the wayWhat’s nextReferencesSort: