Google announced Aletheia, an AI using Gemini 3 Deep Think that solved 6/10 novel math problems in the FirstProof challenge. Aletheia also scored ~91.9% on IMO-ProofBench, signaling a significant shif

InfoQ is a leading online platform for software developers, architects, and technical leaders, providing news, articles, presentations, and interviews on a wide range of topics, including agile practices, DevOps, microservices, and emerging technologies. With a focus on quality content and expert insights, InfoQ helps professionals stay informed about the latest trends, best practices, and industry developments. Developers can learn from real-world experiences, gain  knowledge, and connect with peers in the global software community through InfoQ's diverse and engaging content.

InfoQ

Google DeepMind announced Aletheia, a fully autonomous AI agent built on Gemini 3 Deep Think that solved 6 out of 10 novel, unpublished research-level math problems in the FirstProof challenge. The system operates zero-shot without human hints, using a multi-agent loop of Generator, Verifier, and Reviser components plus Google Search integration. Notably, it outputs 'No solution found' rather than hallucinating answers, which researchers consider a key reliability feature. Aletheia also scored ~91.9% on IMO-ProofBench. For comparison, OpenAI tackled the same challenge with an unreleased model, initially claiming 6 solutions but later revised to 5 after one was found flawed, and unlike Aletheia, OpenAI used limited human supervision. Researchers acknowledge Aletheia still makes more errors than human experts and exhibits specification gaming tendencies, meaning full autonomy in math research remains an open challenge.

Google’s Aletheia Advances the State of the Art of Fully Autonomous Agentic Math Research