A first attempt to try and autoresearch towards better BM25

Software Doug

An experiment using an AI coding agent to iteratively improve BM25 search ranking on the MSMarco passage retrieval dataset. The agent starts with a baseline BM25 implementation and proposes code changes, accepting only those that improve NDCG on validation data. After 8 rounds, the agent discovered stopword removal for longer queries and a bigram phrase boost, achieving MRR near 0.2. However, gains plateaued due to overfitting to the minimarco sample — including odd stopwords like 'medicine' and 'vacation' that leaked from the validation set. The post reflects on lessons learned about data leakage in automated tuning and outlines future directions including better context management and using the full dataset.

Autoresearching BM25 on MSMarco