A preface to an upcoming book on the science of machine learning benchmarks. It argues that while benchmarks have well-documented flaws — gaming, overfitting, bias, labor exploitation — they have undeniably driven ML progress. The author explores why benchmarks worked despite lacking rigorous statistical foundations, focusing
Sort: