LinkedIn Engineering

LinkedIn rebuilt its search infrastructure using large language models to enable semantic search that understands natural language intent rather than just keyword matching. The system uses GPU-based embedding retrieval, small language model ranking with cross-encoders, and LLM-based query understanding to serve millions of queries per second. Key innovations include multi-teacher distillation for training compact models, context compression techniques (summarization and embedding compression) to reduce inference costs, and an LLM judge framework for continuous quality measurement aligned with product policy. The approach achieved double-digit improvements in search quality while maintaining inference costs comparable to traditional recommendation systems.

Reimagining LinkedIn’s search tech stack