Poor retrieval quality is the primary driver of LLM hallucinations in RAG systems, not model size or prompt design. Research across HaluEval, TruthfulQA, and FaithDial benchmarks shows retrieval failures consistently dominate hallucination causes. Five key failure modes are identified: retrieval drift, context truncation, stale index poisoning, low-relevance top-k retrieval, and inter-agent miscommunication. Four dimensions of retrieval quality improvement are covered: embedding model selection, chunking architecture, retrieval strategy (hybrid search, cross-encoder re-ranking, relevance thresholding), and index freshness. In multi-agent pipelines, retrieval failures compound silently across agents, making detection especially difficult. Practical recommendations include auditing retrieval before upgrading models, implementing hybrid search, enforcing similarity thresholds, and validating context at every agent boundary.

11m read timeFrom weaviate.io
Post cover image
Table of contents
Understanding the Retrieval Layer in RAG Systems ​How Retrieval Failure Drives LLM Hallucination: Evidence from Research ​Why Scaling the Model Does Not Solve a Retrieval Problem ​Four Dimensions of Retrieval Quality ​Evaluating Retrieval Quality: A Practical Measurement Framework ​A Compounding Problem: Retrieval Failure in Multi-Agent Systems ​Practical Recommendations for Production Systems ​Summary ​Ready to start building? ​Don't want to miss another blog post?

Sort: