Vinted's engineering team details how they built a personalised search autocomplete system serving 125 million suggestions across 24 languages at 4,700 QPS with 31ms P99 latency. The system uses a two-phase approach: an offline Self-Learning Suggestions (SLS) pipeline drawing candidates from product metadata and search logs, and an online Vespa-powered serving layer with edge-ngram indexing, fuzzy matching, and a LightGBM Learning-to-Rank model for personalisation. Key technical decisions include implementing edge-ngram tokenisation inside Vespa (dropping P99 from 220ms to 25ms), cascading query relaxation from exact prefix to fuzzy matching, and a 63-feature LTR model optimising NDCG@1. Over 35+ A/B experiments, suggestion usage grew from under 8% to over 20% of search sessions. Notable findings: query-based candidates are only 2% of the pool but drive ~50% of clicks, reducing debounce from 350ms to 100ms boosted usage ~12%, and scoped suggestions increased CTR but hurt conversion.

24m read timeFrom vinted.engineering
Post cover image
Table of contents
Generating and scoring 125 million suggestionsIndexing suggestions into VespaMatching user input in millisecondsPersonalising ranking with Learning-to-RankHigh-level architectureVespa hardwareLessons from 35+ A/B testsLearnings along the wayWhat’s nextReferences

Sort: