Gemini 3.1 Pro is the smartest model ever made

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

Gemini 3.1 Pro achieves top benchmark scores across multiple evaluations — including a 78% on ARC AGI 2, near-perfect Convex scores, and 100% on a custom skateboarding spatial reasoning benchmark — at roughly half the cost of competing models like Opus 4.6. However, despite its raw intelligence and knowledge depth, the model is plagued by serious practical issues: unreliable tool calling, looping behavior, hallucinated packages, poor agentic task completion, and a deeply buggy official CLI that randomly switches to other models. The author contrasts Gemini's benchmark dominance with Anthropic's focus on consistent, reliable tool use (even in smaller models like Haiku), arguing Google is 'benchmaxing' at the expense of real-world usability. The conclusion: Gemini 3.1 Pro is the smartest model available but one of the least pleasant to actually use for coding and agentic workflows.

24m watch time

Sort: