How LinkedIn Feed Uses LLMs to Serve 1.3 Billion Users

LinkedIn replaced five separate Feed retrieval systems with a single LLM-powered dual encoder model to serve 1.3 billion users. Key engineering decisions include: converting raw numerical features into percentile buckets (boosting popularity-embedding correlation 30x), filtering training data to only positively-engaged posts (2.6x faster training, 15% better recall), using both easy and hard negatives for contrastive learning, and building a Generative Recommender with causal transformer attention and Multi-gate Mixture-of-Experts heads for multi-task ranking. Infrastructure innovations include shared context batching, a custom Flash Attention variant (GRMIS) for 2x speedup, disaggregated CPU/GPU serving, and continuously running embedding refresh pipelines.

#machine-learning

#llm

#distributed-systems

#linkedin

#recommendation-systems

Apr 13•11m read time•From blog.bytebytego.com

Table of contents

How to stop babysitting your agents (Sponsored)Five Librarians, One Library The Model Is Only As Good As Its Input Less Data, Better Model The Feed Is a Story, Not a Snapshot Making It All Work at Scale Conclusion