In this video, I'll be telling you about Google's new Gemini 3.1 Pro and whether it's actually worth paying for. After running it through my personal KingBench benchmarks on both one shot and agentic tasks, the results are honestly pretty disappointing compared to its predecessor.

--
Key Takeaways:

📉 Gemini 3.1 Pro regressed on one shot tasks, dropping from 100% to 96% while costing more than double compared to Gemini 3 Pro.
🤖 On agentic benchmarks, Gemini 3.1 Pro scored just 49.2, falling from rank 7 to rank 19 compared to Gemini 3 Pro Preview.
⏳ The model has a serious over-planning problem, spending up to 114 seconds planning before writing a single line of code.
🛠️ It frequently misuses agentic tools, embeds questions into planning responses instead of using the proper ask tool, and makes basic coding mistakes.
💸 At $2 per million input tokens and $12 per million output tokens, there are far better alternatives like Sonnet 4.6 and GLM 5.
🆓 If you're on the free tier through Gemini CLI or Antigravity, it's a great option since 96% for free is hard to beat.
🏆 Sonnet 4.6 leads the agentic leaderboard at 87.9, while Gemini 3.1 Pro lags far behind at 49.2.

AICodeKing

Gemini 3.1 Pro is benchmarked against its predecessor and competing models using a custom evaluation suite (KingBench) covering both one-shot and agentic coding tasks. On one-shot tasks, it scored 96% versus Gemini 3 Pro's perfect 100%, while costing more than double. On agentic tasks, performance dropped dramatically from rank 7 (71.4 score) to rank 19 (49.2 score). Key behavioral issues include excessive and repetitive planning phases lasting up to 2 minutes, failure to use agentic tool APIs correctly, redundant thinking loops, and basic coding mistakes like duplicate method declarations and incorrect package names. Despite strong official benchmark numbers (ARC-AGI2, SWEBench), real-world agentic performance lags far behind Claude Sonnet 4.6, GLM5, and even the previous Gemini 3 Pro. The model is considered worthwhile only on free tiers; paying API users are advised to consider alternatives.

Gemini 3.1 Pro (Fully Tested): This MODEL is ACTUALLY BAD & A MESS.