Token usage is the wrong way to measure AI productivity. Here’s what engineering leaders should track instead.

LeadDev

Token usage — dubbed 'tokenmaxxing' — is emerging as the AI era's equivalent of measuring lines of code: easy to track, easy to game, and disconnected from real productivity. Meta's internal token-consumption leaderboard sparked industry backlash after going public. Engineering leaders share alternative frameworks: a four-level 'cognitive delegation' model inspired by the executive chef analogy (measuring how much mental work engineers offload to AI agents), and a self-reporting approach that works in high-trust organizations. Experts agree token usage is a useful early adoption signal but a poor productivity metric, and that combined metrics — not a single number — are needed. No consensus exists yet on the right replacement, and tooling to connect token spend to shipping outcomes is still immature.

Tokenmaxxing and the search for AI metrics that matter

From line cook to executive chef: a skill-level-based framework

The opposite approach: self-reporting and trust

Hard to measure, but it has to be measured