Pass@k is Mostly Bunk
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
The pass@k metric, commonly used to evaluate AI agents, is fundamentally flawed because it's exponentially forgiving. While it measures the probability that at least one of k attempts succeeds, this creates misleadingly high success rates even for poor-performing models. A model with only 5% success rate can show 99.4%
•2m read time• From brooker.co.za
Sort: