Reasoning models like Gemini Flash and GPT-5.2 generate hidden 'thinking tokens' (internal chain-of-thought) that are billed at output token rates but never shown in responses. A Stanford/UC Berkeley study found that in 21.8% of model pair comparisons, the cheaper-listed model actually costs more in practice — with one benchmark showing a model listed 1.7x cheaper costing 28x more. This creates a denial-of-wallet attack surface: adversaries can craft legitimate-looking inputs that trigger expensive reasoning paths, blowing through API budgets. Defenses include tracking thinking tokens per request, setting per-request cost ceilings, benchmarking actual workloads before model selection, and routing simple queries to non-reasoning models. Providers should standardize thinking token budget controls per request.

10m read timeFrom infosecwriteups.com
Post cover image
Table of contents
What Happened to My API BillThe Research That Explains ItWhy This Is a Security ProblemBuilding a Quick Cost MonitorHow an Attacker Exploits ThisGet Raviteja Nekkalapu ’s stories in your inboxThe Irreducible ProblemDefensive Measures That Actually HelpWhat the Industry Needs to Do

Sort: