Thinking Tokens Are the New Denial-of-Wallet Attack Surface

Reasoning models like Gemini Flash and GPT-5.2 generate hidden 'thinking tokens' (internal chain-of-thought) that are billed at output token rates but never shown in responses. A Stanford/UC Berkeley study found that in 21.8% of model pair comparisons, the cheaper-listed model actually costs more in practice — with one benchmark showing a model listed 1.7x cheaper costing 28x more. This creates a denial-of-wallet attack surface: adversaries can craft legitimate-looking inputs that trigger expensive reasoning paths, blowing through API budgets. Defenses include tracking thinking tokens per request, setting per-request cost ceilings, benchmarking actual workloads before model selection, and routing simple queries to non-reasoning models. Providers should standardize thinking token budget controls per request.

#finops

#ai-security

Mar 31•10m read time•From infosecwriteups.com

Table of contents

What Happened to My API Bill The Research That Explains It Why This Is a Security Problem Building a Quick Cost Monitor How an Attacker Exploits This Get Raviteja Nekkalapu ’s stories in your inbox The Irreducible Problem Defensive Measures That Actually Help What the Industry Needs to Do

Comment

Bookmark

Copy

Sort: