Offline LLMs, Online Personalization: Generating carousels at DoorDash

DoorDash describes a production framework for generating hyper-personalized grocery store carousels using LLMs offline, avoiding inline LLM latency. The system uses a structured 'consumer memory block' as typed input to batch LLM calls that produce carousel definitions (titles, subtitles, search intents). Generated intents are embedded and stored in Milvus, then served at request time via hybrid retrieval combining embedding-based retrieval (EBR) and structured taxonomy lookup — with no LLM in the request path. Key engineering decisions include sharded Metaflow batch pipelines for millions of consumers, blue/green Milvus collection swaps for zero-downtime refreshes, per-use-case memory block trimming to reduce token cost and improve quality, and an LLM-as-judge offline eval framework that treats prompt changes like versioned model artifacts with CI. A/B results showed ~1% relative increase in pet product order rate and $0.47 increase in per-user spend over three weeks.

#llm

#vector-search

#recommendation-systems

May 27•20m read time•From careersatdoordash.com

Table of contents

Current issues with using LLMs for recommendations A multi-stage pipeline system architecture Stay Informed with Weekly Updates Please enter a valid email address.Thank you for Subscribing!Product impact Lessons learned Conclusion

Comment

Bookmark

Copy

Sort: