Yelp's engineering team details how they evolved 'Biz Ask Anything' (Yelp Assistant on business pages) from a two-week prototype to a production system over nine months. Key challenges covered include: building near-real-time data pipelines (NRT indices, Cassandra EAV store) with sub-100ms reads; a multi-stage question analysis pipeline with fine-tuned GPT-4.1-nano models for Trust & Safety classification, inquiry type routing, content source selection, and keyword generation; multi-dimensional answer quality evaluation using LLM-as-judge graders (correctness, completeness, evidence relevance); latency reduction from 10-20s to under 3s p75 via streaming (SSE), async pipelines, and model prioritization; cost reduction to 25% of original via smaller fine-tuned models, intelligent context window trimming, and dynamic prompt composition; and content-driven suggested questions that lifted engagement ~50% and reduced inability-to-answer rate by ~26%.

27m read timeFrom engineeringblog.yelp.com
Post cover image
Table of contents
Finding an Answer in an Ocean of ContentThe Life of a QuestionThe Data Problem — Scalable, Fresh, SearchableDeconstructing the Question — Intent, Safety, and RetrievalAnswer Quality — Accuracy, Helpfulness and VoicePerformance — Time-to-First-Token is CrucialCost - Every token countsUser Education — “What Can I Even Ask?”Acknowledgements

Sort: