Users on Claude Code's Pro Max 5x plan are experiencing quota exhaustion within 1.5 hours despite moderate usage. Investigation points to cache_read tokens potentially counting at full rate against rate limits rather than the expected 1/10 rate, negating prompt caching benefits for quota purposes. Community analysis using a telemetry interceptor (claude-code-cache-fix) collected data across multiple quota windows and found that cache_read=0.0x (not counting toward quota) best fits the data, contradicting the full-rate hypothesis. Additional compounding factors include a silent regression reducing cache TTL from 1 hour to 5 minutes (causing frequent expensive cache_create events), context window inflation, background sessions consuming shared quota, and auto-compact spikes. Workarounds suggested include keeping sessions short, using /compact aggressively, and front-loading context in CLAUDE.md.
Table of contents
SummaryEnvironmentData Collection MethodMeasured Token ConsumptionThe ProblemContext Size ProgressionSpecific IssuesReproductionExpected BehaviorActual BehaviorSuggested ImprovementsSort: