Claude Throttling and the Cloud

Anthropic confirmed in March 2026 that it has been throttling Claude session limits during weekday peak hours (5 AM–11 AM Pacific), causing heavy users to exhaust their quotas faster than expected. The throttling affects roughly 7% of users—primarily Pro and Max subscribers doing token-intensive work like coding or running agents—while API users paying per token are unaffected. The root cause is a combination of real infrastructure constraints (specialized AI accelerator chips like NVIDIA H100s, AWS Trainium2, and Google TPUs are physically scarce and expensive) and unsustainable unit economics (Anthropic has spent over $10 billion on inference and training while generating only $5 billion in cumulative revenue). A simultaneous bug in Claude Code's prompt caching system caused 10–20x token overconsumption, compounding the problem. The situation exposes a fundamental tension between the traditional cloud promise of elastic infinite scale and the reality of frontier AI inference, where each token requires fixed GPU compute that cannot be provisioned on demand. Practical advice for developers includes scheduling batch jobs during off-peak hours, preferring API access over subscriptions for production workloads, and monitoring token consumption closely.

#cloud

#claude

#anthropic

#ai-infrastructure

#ai-inference

Apr 07•12m read time•From ardalis.com

Table of contents

Where Claude Actually Lives What Actually Happened: The March 2026 Throttling Is It Cost Savings or Real Infrastructure Limits?What This Tells Us About “Cloud Scale” for AI What It Means for Developers The Bigger Picture References

Comment

Bookmark

Copy

Sort: