Building The Intent Engine: How Instacart is Revamping Query Understanding with LLMs

Instacart rebuilt their query understanding system using LLMs to better handle long-tail searches and ambiguous queries. The team progressed from context-engineering with RAG to fine-tuning smaller models like Llama-3-8B, consolidating multiple specialized models into a unified system. They implemented a hybrid architecture: an offline pipeline generates high-quality training data and caches results for common queries, while a fine-tuned real-time model handles rare searches. Through adapter merging, GPU optimization, and quantization experiments, they reduced latency from 700ms to 300ms while improving search quality metrics by 6% for tail queries.

#machine-learning

#llm

#nlp

#rag

Nov 13, 2025•13m read time•From tech.instacart.com

Table of contents

Introduction Challenges in Traditional Query Understanding The Advantages of LLMs LLM as QU: Our Strategy in Action 1. Query Category Classification 2. Query Rewrites 3. Semantic Role Labeling (SRL)Get Yuanzheng Zhu’s stories in your inbox Building a New Foundation: Fine-Tuning for Real-Time Inference Distilling Knowledge via Fine-Tuning The Path to Production: Taming Real-Time Latency Key Takeaways

Comment

Bookmark

Copy

Sort: