📸 gotta find 'em all; spatial reasoning benchmark for LLMs - kxzk/snapbench

Hacker News is a community-driven platform for sharing and discussing technology news, startups, and programming-related topics. Through user submissions and comments, Hacker News offers insights into emerging technology trends, industry developments, and entrepreneurial ventures. Readers can participate in discussions, share their insights, and stay informed about the latest advancements in technology and innovation.

Hacker News

SnapBench is a spatial reasoning benchmark that tests vision-language models by having them pilot a drone through a 3D voxel world to locate and identify creatures. The benchmark revealed surprising results: Gemini Flash, the cheapest model tested, was the only one of 7 frontier LLMs that successfully completed the task. The key differentiator was altitude control—most models failed to descend to ground level where creatures were located. The project uses Zig for simulation, Rust for orchestration, and Python for benchmarking, with all models accessed via OpenRouter API.

kxzk/snapbench: 📸 gotta find 'em all; spatial reasoning benchmark for LLMs