The ORCA Benchmark, consisting of 500 practical math questions, evaluated a new round of leading LLMs including ChatGPT 5.2, Gemini 3 Flash, Grok 4.1, and DeepSeek V3.2. Gemini 3 Flash led with 72.8% accuracy, while others scored between 54–60%. All models improved except Grok 4.1, which regressed. A key finding is that
•5m read time• From go.theregister.com
Sort: