DoorDash describes a production framework for generating hyper-personalized grocery store carousels using LLMs offline, avoiding inline LLM latency. The system uses a structured 'consumer memory block' as typed input to batch LLM calls that produce carousel definitions (titles, subtitles, search intents). Generated intents are embedded and stored in Milvus, then served at request time via hybrid retrieval combining embedding-based retrieval (EBR) and structured taxonomy lookup — with no LLM in the request path. Key engineering decisions include sharded Metaflow batch pipelines for millions of consumers, blue/green Milvus collection swaps for zero-downtime refreshes, per-use-case memory block trimming to reduce token cost and improve quality, and an LLM-as-judge offline eval framework that treats prompt changes like versioned model artifacts with CI. A/B results showed ~1% relative increase in pet product order rate and $0.47 increase in per-user spend over three weeks.
Table of contents
Current issues with using LLMs for recommendationsA multi-stage pipeline system architectureStay Informed with Weekly UpdatesPlease enter a valid email address.Thank you for Subscribing!Product impactLessons learnedConclusionSort: