An empirical comparison of four embedding models (GloVe through text-embedding-3-large) and three cross-encoder rerankers (bge-base, bge-large, ms-marco-MiniLM) across five challenging RAG query shapes reveals that the expected cost-performance gradient mostly doesn't hold. On four of five test cases, rerankers either match or underperform strong embeddings. Only signal dilution in long context is a clear reranker win. Negation, out-of-domain vocabulary, listing queries, and exact identifiers at scale remain broken regardless of scorer. The article argues that upstream architectural choices — question parsing, classify-before-retrieve, and expert keyword dictionaries — deliver more value per dollar than stacking a reranker on weak retrieval.

20m read timeFrom towardsdatascience.com
Post cover image
Table of contents
1. What a reranker actually is2. The cost-perf gradient, tested on the same cases3. Where the cross-encoder still breaks4. Where rerankers actually justify their cost5. Conclusion6. Further reading

Sort: