Hacker News is a community-driven platform for sharing and discussing technology news, startups, and programming-related topics. Through user submissions and comments, Hacker News offers insights into emerging technology trends, industry developments, and entrepreneurial ventures. Readers can participate in discussions, share their insights, and stay informed about the latest advancements in technology and innovation.

Hacker News

A preface to an upcoming book on the science of machine learning benchmarks. It argues that while benchmarks have well-documented flaws — gaming, overfitting, bias, labor exploitation — they have undeniably driven ML progress. The author explores why benchmarks worked despite lacking rigorous statistical foundations, focusing on model rankings rather than absolute scores as the true scientific output. The book covers the holdout method, adaptivity problems, cross-validation, and transitions to LLM-era challenges: unknown training data contamination, multi-task aggregation problems (drawing on social choice theory), performativity, and the existential challenge of evaluating models that surpass human evaluators. The goal is to build a proper scientific foundation for benchmarking practice.

The Emerging Science of Machine Learning Benchmarks