Instacart's ML team shares how they rebuilt their Shopping Hub recommendation system using LLMs. The new AI-native platform uses a top-down, cascaded generation approach: a page design agent generates personalized themes from user context, a fine-tuned student model generates retrieval keywords via RAG, and quality/diversity filtering guards against off-brand or redundant content before passing to existing ranking infrastructure. Key techniques include teacher-student fine-tuning on Llama/Qwen models, RAG-based keyword candidate pruning (reducing generation costs 15-20%), and a fine-tuned DeBERTa cross-encoder for scalable quality filtering (99% cost reduction vs. LLM inference). Early A/B results are promising, with generative placements outperforming static baselines in offline evaluations. Key learnings: keep modeling tasks focused, invest heavily in evals, and add structure to input/output layers.
Table of contents
IntroductionMethodologyPhase 1: Page Design & Theme GenerationPhase 2: Retrieval Keyword GenerationGet Moein Hasani ’s stories in your inboxPhase 3: Quality and Diversity FilteringPhase 4: Product & Pagewise RankingDesigning for Rapid Iteration: Treating Evals as a First-Class CitizenBringing the Pieces TogetherSort: