From x.com
rryssf_'s profile

Robert Youssef @rryssf_

RT @rryssf_: a benchmark isn't a dataset. it's a triplet: dataset, model, judge. new paper audited Omni-MATH (Olympiad-level math) and fou…

Sort: