ARC-AGI-3 is the first interactive reasoning benchmark for AI agents—play as humans and build agents that learn in novel environments.

Hacker News is a community-driven platform for sharing and discussing technology news, startups, and programming-related topics. Through user submissions and comments, Hacker News offers insights into emerging technology trends, industry developments, and entrepreneurial ventures. Readers can participate in discussions, share their insights, and stay informed about the latest advancements in technology and innovation.

Hacker News

ARC-AGI-3 is a new interactive reasoning benchmark designed to measure human-like intelligence in AI agents. Unlike static puzzle benchmarks, it requires agents to explore novel environments, acquire goals dynamically, build world models, and learn continuously from experience without natural-language instructions. It measures intelligence across time by evaluating skill-acquisition efficiency, long-horizon planning with sparse feedback, and experience-driven adaptation. The benchmark includes replayable runs, a developer toolkit for agent integration, and an interactive UI for evaluation. A perfect score means an AI agent can beat every environment as efficiently as a human.