Reasoning models like Gemini Flash and GPT-5.2 generate hidden 'thinking tokens' (internal chain-of-thought) that are billed at output token rates but never shown in responses. A Stanford/UC Berkeley study found that in 21.8% of model pair comparisons, the cheaper-listed model actually costs more in practice — with one benchmark showing a model listed 1.7x cheaper costing 28x more. This creates a denial-of-wallet attack surface: adversaries can craft legitimate-looking inputs that trigger expensive reasoning paths, blowing through API budgets. Defenses include tracking thinking tokens per request, setting per-request cost ceilings, benchmarking actual workloads before model selection, and routing simple queries to non-reasoning models. Providers should standardize thinking token budget controls per request.
Table of contents
What Happened to My API BillThe Research That Explains ItWhy This Is a Security ProblemBuilding a Quick Cost MonitorHow an Attacker Exploits ThisGet Raviteja Nekkalapu ’s stories in your inboxThe Irreducible ProblemDefensive Measures That Actually HelpWhat the Industry Needs to DoSort: