•From x.com

Robert Youssef @rryssf_
RT @rryssf_: a benchmark isn't a dataset. it's a triplet: dataset, model, judge. new paper audited Omni-MATH (Olympiad-level math) and fou…
Sort:

Robert Youssef @rryssf_
RT @rryssf_: a benchmark isn't a dataset. it's a triplet: dataset, model, judge. new paper audited Omni-MATH (Olympiad-level math) and fou…
Sort: