Building Biz Ask Anything: From Prototype to Product

Yelp's engineering team details how they evolved 'Biz Ask Anything' (Yelp Assistant on business pages) from a two-week prototype to a production system over nine months. Key challenges covered include: building near-real-time data pipelines (NRT indices, Cassandra EAV store) with sub-100ms reads; a multi-stage question analysis pipeline with fine-tuned GPT-4.1-nano models for Trust & Safety classification, inquiry type routing, content source selection, and keyword generation; multi-dimensional answer quality evaluation using LLM-as-judge graders (correctness, completeness, evidence relevance); latency reduction from 10-20s to under 3s p75 via streaming (SSE), async pipelines, and model prioritization; cost reduction to 25% of original via smaller fine-tuned models, intelligent context window trimming, and dynamic prompt composition; and content-driven suggested questions that lifted engagement ~50% and reduced inability-to-answer rate by ~26%.

#llm

#deep-learning

#rag

Apr 10•27m read time•From engineeringblog.yelp.com

Table of contents

Finding an Answer in an Ocean of Content The Life of a Question The Data Problem — Scalable, Fresh, Searchable Deconstructing the Question — Intent, Safety, and Retrieval Answer Quality — Accuracy, Helpfulness and Voice Performance — Time-to-First-Token is Crucial Cost - Every token counts User Education — “What Can I Even Ask?”Acknowledgements

Comment

Bookmark

Copy

Sort: