Reciprocal Rank Fusion (RRF): How It Works and When to Use It

Reciprocal Rank Fusion (RRF) is a rank aggregation algorithm that merges results from multiple retrievers (typically BM25 and vector search) by summing reciprocal rank positions using the formula RRF_score(d) = Σ 1/(k + rank). Unlike score normalization methods, RRF operates on rank positions rather than raw scores, making it robust to mismatched score distributions between retrievers. The default k=60 was established in a 2009 SIGIR paper and has proven durable across benchmarks. The post covers the math behind RRF, a worked example showing why consistent mid-list ranking beats a single-list #1, comparisons with min-max and L2 normalization, and concrete implementation examples for OpenSearch, Elasticsearch, Azure AI Search, MongoDB Atlas, pgvector/PostgreSQL, and Weaviate. Production tuning guidance covers rank window size, weighted RRF variants, evaluation metrics (NDCG@10, Recall@k, MRR), and common pitfalls like deduplication failures and mismatched document IDs.

#elk

#vector-search

#opensearch

May 18•14m read time•From bigdataboutique.com

Table of contents

What Reciprocal Rank Fusion Actually Does The Formula, Step by Step RRF vs. Score Normalization RRF in Practice Across Search Engines Tuning RRF in Production Frequently Asked Questions Key Takeaways

Comment

Bookmark

Copy

Sort: