A research paper introduces a benchmark of ten previously unpublished research-level mathematics questions to evaluate AI systems' ability to solve advanced mathematical problems. The questions emerged naturally from the authors' research work, with answers known but temporarily kept encrypted to enable fair testing of current AI capabilities.
Sort: