A team of 50 developers can burn $30K/month on AI coding assistants without noticing. The root cause isn't just cost — it's context rot (model recall degrades as context fills) and latency. The post presents eight practical levers to reduce token usage: scoping asks with explicit file references, ordering prompts for cache hits, disabling unused MCP servers, codifying team conventions in instruction files, routing tasks to cheaper models by default, requesting diffs instead of full files, tightening .gitignore for the indexer, and using latency as a token meter. Three workflow patterns compound these savings: the 'Ralph Wiggum loop' (agent runs a TODO.md checklist with a cheap model, one task per fresh chat), auto-compact (summarize to plan.md at 60-80% context, continue in new chat), and a Planner→Implementer→Reviewer split (expensive model plans once, cheap model implements, expensive model reviews the diff). A Monday checklist ties it all together.

16m read timeFrom foojay.io
Post cover image
Table of contents
Where the tokens actually goThe Eight LeversThree workflow patterns that compoundThe Monday checklistClosing

Sort: