BM25 odds vs probabilities: a tour of Bayesian BM25 and what it means for hybrid search calibration.

Software Doug

BM25 scores are unbounded and hard to combine with probabilistic signals like embedding similarities. Bayesian BM25 (BB25) proposes a framework to convert BM25 into a calibrated probability using a Bayesian prior (term frequency and field norms) and a sigmoid-scaled likelihood. The posterior is computed via Bayes' theorem, and parameters are tuned via gradient descent against labeled data. The post also covers alternative approaches: scaling other signals to match BM25's scale (as in Elasticsearch's rank feature query), learning boost weights directly through random or Bayesian optimization, and normalizing Lucene's BM25 by dividing IDF by log(num_docs). The core insight is that calibrated probabilities enable principled hybrid search fusion rather than ad-hoc score blending.

Can BM25 be a probability?

BM25: probabilistic but not a probability

BB25 - Bayesian BM25 → better hybrid search