BM25, or Best Match 25, is a popular full text search algorithm used in systems like Lucene/Elasticsearch and SQLite. It ranks documents based on their relevance to a search query using probability. Key components include the frequency of query terms in documents, document length normalization, and the inverse document frequency of terms. The algorithm focuses on ranking documents within a collection rather than offering absolute relevance scores. For personalized content feeds, BM25 can enhance search results by combining it with vector similarity search for more accurate keyword matching.

12m read timeFrom emschwartz.me
Post cover image
Table of contents
Motivation: can BM25 scores be compared across queries?Ranking documents probabilisticallyComponents of BM25Behold, math!Cleverness of BM25 and its precursorsConclusion: BM25 scores can be compared within the same collectionFurther reading

Sort: