An experimental comparison of two RAG pipelines—one with query optimization and neighbor expansion, one without—across three datasets (clean corpus questions, messy corpus questions, and random real-world questions). The complex pipeline showed minimal advantage on clean synthetic questions but significantly outperformed on diffuse, multi-faceted queries by reducing fabrication, though at a 41% cost increase and 49% latency penalty. The naive pipeline failed by omission (hallucinating missing information), while the complex pipeline failed by inflation (over-synthesizing across sources). Query optimization helped 38% of questions but hurt 27%, suggesting careful tuning is needed. Most cost comes from the reranker, not added context.
Sort: