ARC-AGI-3 is a new interactive reasoning benchmark designed to measure human-like intelligence in AI agents. Unlike static puzzle benchmarks, it requires agents to explore novel environments, acquire goals dynamically, build world models, and learn continuously from experience without natural-language instructions. It measures intelligence across time by evaluating skill-acquisition efficiency, long-horizon planning with sparse feedback, and experience-driven adaptation. The benchmark includes replayable runs, a developer toolkit for agent integration, and an interactive UI for evaluation. A perfect score means an AI agent can beat every environment as efficiently as a human.
Table of contents
What is ARC-AGI-3?Sort: