LinkedIn replaced five separate Feed retrieval systems with a single LLM-powered dual encoder model to serve 1.3 billion users. Key engineering decisions include: converting raw numerical features into percentile buckets (boosting popularity-embedding correlation 30x), filtering training data to only positively-engaged posts (2.6x faster training, 15% better recall), using both easy and hard negatives for contrastive learning, and building a Generative Recommender with causal transformer attention and Multi-gate Mixture-of-Experts heads for multi-task ranking. Infrastructure innovations include shared context batching, a custom Flash Attention variant (GRMIS) for 2x speedup, disaggregated CPU/GPU serving, and continuously running embedding refresh pipelines.
Table of contents
How to stop babysitting your agents (Sponsored)Five Librarians, One LibraryThe Model Is Only As Good As Its InputLess Data, Better ModelThe Feed Is a Story, Not a SnapshotMaking It All Work at ScaleConclusion1 Comment
Sort: