Traditional RAG systems lose context when documents are split into chunks, leading to irrelevant retrievals. Contextual retrieval, introduced by Anthropic in 2024, solves this by using an LLM to generate a short contextual description for each chunk before indexing, situating it within its source document. This enriched chunk is then used for both vector and BM25 indexing. The result is a reported 35% improvement in retrieval accuracy. Cost concerns are mitigated because the extra LLM calls happen only at ingestion time, not at query time, and prompt caching can further reduce expenses.

9m read timeFrom towardsdatascience.com
Post cover image
Table of contents
What about context?What about contextual retrieval?Reducing cost with prompt cachingOn my mind

Sort: