Using LLMs to build content embeddings for DoorDash search and recommendations

DoorDash engineers describe how they replaced sparse behavioral signals with LLM-generated merchant and item profiles to power content-first embeddings across food, grocery, retail, and gifting verticals. The post covers the full pipeline: incremental Metaflow-based embedding refresh, evaluation using an LLM-as-a-judge harness (Hit@K, nDCG@K), and model selection (gemini-embedding-001 with 256-dim MRL). Key finding: data quality (LLM profiles) dominates over model choice, delivering +31% item similarity gains vs. +6% from a better encoder alone. Production results include a 3.65% reduction in null search rate, +0.66% search CVR, +2.4% homepage order rate, and 68→85% offline precision@10. The post also discusses limitations of text-based consumer embeddings and future directions including semantic IDs, generative retrieval, and context-conditioned consumer representations.

#llm

#rag

#vector-search

#recommendation-systems

Apr 14•21m read time•From careersatdoordash.com

Table of contents

Traditional playbook for content embeddings Why content-first and why now Stay Informed with Weekly Updates Please enter a valid email address.Thank you for Subscribing!Product applications Limitation: Consumer embeddings from consumer profiles Future directions Acknowledgements Reference

Comment

Bookmark

Copy

Sort: