At CIKM 2025, DoorDash shared a framework for learning generalizable multimodal embeddings that improve CPG ranking and retrieval performance.

DoorDash offers insights into food delivery technology, logistics, and customer experience. Developers can delve into DoorDash's engineering blogs, tech talks, and innovation stories to gain a deeper understanding of how technology powers the food delivery industry. By exploring topics such as scaling infrastructure, optimizing delivery routes, and enhancing user experience through innovative technology solutions, developers can use insights to apply to their own projects and industries.

Doordash

DoorDash developed DashCLIP, a multimodal embedding framework that combines text and image encoders to generate semantic representations of products and user queries. The system uses contrastive learning on 400,000 products, domain adaptation from BLIP-14M, and LLM-augmented relevance datasets to improve ad ranking and retrieval. Online A/B tests showed significant improvements in engagement and revenue, with the model now serving 100% of traffic. The embeddings also generalize well to other e-commerce tasks like category prediction and relevance scoring.

DashCLIP: Leveraging multimodal models for generating semantic embeddings