DoorDash engineers describe how they replaced sparse behavioral signals with LLM-generated merchant and item profiles to power content-first embeddings across food, grocery, retail, and gifting verticals. The post covers the full pipeline: incremental Metaflow-based embedding refresh, evaluation using an LLM-as-a-judge harness (Hit@K, nDCG@K), and model selection (gemini-embedding-001 with 256-dim MRL). Key finding: data quality (LLM profiles) dominates over model choice, delivering +31% item similarity gains vs. +6% from a better encoder alone. Production results include a 3.65% reduction in null search rate, +0.66% search CVR, +2.4% homepage order rate, and 68→85% offline precision@10. The post also discusses limitations of text-based consumer embeddings and future directions including semantic IDs, generative retrieval, and context-conditioned consumer representations.

21m read timeFrom careersatdoordash.com
Post cover image
Table of contents
Traditional playbook for content embeddingsWhy content-first and why nowStay Informed with Weekly UpdatesPlease enter a valid email address.Thank you for Subscribing!Product applicationsLimitation: Consumer embeddings from consumer profilesFuture directionsAcknowledgementsReference

Sort: