Gemini 3.1 Pro (Fully Tested): This MODEL is ACTUALLY BAD & A MESS.
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
Gemini 3.1 Pro is benchmarked against its predecessor and competing models using a custom evaluation suite (KingBench) covering both one-shot and agentic coding tasks. On one-shot tasks, it scored 96% versus Gemini 3 Pro's perfect 100%, while costing more than double. On agentic tasks, performance dropped dramatically from rank 7 (71.4 score) to rank 19 (49.2 score). Key behavioral issues include excessive and repetitive planning phases lasting up to 2 minutes, failure to use agentic tool APIs correctly, redundant thinking loops, and basic coding mistakes like duplicate method declarations and incorrect package names. Despite strong official benchmark numbers (ARC-AGI2, SWEBench), real-world agentic performance lags far behind Claude Sonnet 4.6, GLM5, and even the previous Gemini 3 Pro. The model is considered worthwhile only on free tiers; paying API users are advised to consider alternatives.
Sort: