People are using Super Mario to benchmark AI now
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
Researchers from Hao AI Lab at the University of California San Diego tested AI models using Super Mario Bros. as a benchmark. Anthropic’s Claude 3.7 outperformed other models, while reasoning models like OpenAI’s GPT-4o struggled due to slower decision-making abilities. The game was modified to run in an emulator with the GamingAgent framework, which provided instructions to the AI. This study adds to the ongoing debate about the effectiveness of using games for AI benchmarking.
Sort: