LinkedIn rebuilt its search infrastructure using large language models to enable semantic search that understands natural language intent rather than just keyword matching. The system uses GPU-based embedding retrieval, small language model ranking with cross-encoders, and LLM-based query understanding to serve millions of queries per second. Key innovations include multi-teacher distillation for training compact models, context compression techniques (summarization and embedding compression) to reduce inference costs, and an LLM judge framework for continuous quality measurement aligned with product policy. The approach achieved double-digit improvements in search quality while maintaining inference costs comparable to traditional recommendation systems.

18m read timeFrom linkedin.com
Post cover image
Table of contents
Explainability in searchReasoningModel pruningContext pruningEmbedding compression

Sort: