Anthropic confirmed in March 2026 that it has been throttling Claude session limits during weekday peak hours (5 AM–11 AM Pacific), causing heavy users to exhaust their quotas faster than expected. The throttling affects roughly 7% of users—primarily Pro and Max subscribers doing token-intensive work like coding or running agents—while API users paying per token are unaffected. The root cause is a combination of real infrastructure constraints (specialized AI accelerator chips like NVIDIA H100s, AWS Trainium2, and Google TPUs are physically scarce and expensive) and unsustainable unit economics (Anthropic has spent over $10 billion on inference and training while generating only $5 billion in cumulative revenue). A simultaneous bug in Claude Code's prompt caching system caused 10–20x token overconsumption, compounding the problem. The situation exposes a fundamental tension between the traditional cloud promise of elastic infinite scale and the reality of frontier AI inference, where each token requires fixed GPU compute that cannot be provisioned on demand. Practical advice for developers includes scheduling batch jobs during off-peak hours, preferring API access over subscriptions for production workloads, and monitoring token consumption closely.

12m read timeFrom ardalis.com
Post cover image
Table of contents
Where Claude Actually LivesWhat Actually Happened: The March 2026 ThrottlingIs It Cost Savings or Real Infrastructure Limits?What This Tells Us About “Cloud Scale” for AIWhat It Means for DevelopersThe Bigger PictureReferences

Sort: