A longitudinal study by DX across 400+ engineering organizations found that as AI tool usage increased by 65% over 16 months, median PR throughput rose only ~8%, with most organizations seeing 5–15% gains. The gap between actual results and executive expectations (often 3x–10x) is explained by coding being only ~14% of developer time, new bottlenecks introduced by AI (review burden, technical debt, cognitive debt), and cultural/adoption friction. The discussion covers where reclaimed time actually goes, the risk of 'false velocity', how to measure AI impact (separating acceleration from augmentation), and where leaders should invest next—including applying AI beyond coding to the full SDLC and exploring autonomous agents.

15m read timeFrom newsletter.getdx.com
Post cover image
Table of contents
Brian: One of the central themes of the SPACE framework is that developer productivity is nuanced and about much more than counting activities. So given that, why did you choose PR throughput as your primary measure?Brian: I suspect that, for many, a 5%, 7%, 10% increase in PR throughput matches what they’ve been feeling. But as you mentioned, when I talk to business leaders, they’re often expecting 20, 30, 40, 50% increases in productivity. So why do you think the gains are lower than what people outside the industry might expect?Brian: You don’t have to go far out in the distribution before you see some organizations with substantially higher gains. Any time the median and mean are that different, it implies outliers. So what set apart some of the companies with disproportionately high gains from the more typical experience?Brian: One of the things I’ve been seeing in my research is that, yes, we’re producing 5, 7, 10, 15% more pull requests, but we’re doing it substantially more efficiently. For hands-on keyboard time spent coding, we’re producing about 40% more PRs per hour. That hints that we are reclaiming some coding time. Do you have any idea where that time is getting reinvested?Brian: There’s a reason we don’t just measure lines of code as a good measure of productivity. Writing more code isn’t always the right answer. As we increase velocity, what are some of the potential unwanted side effects you’ve been seeing?Brian: So what’s your recommendation to engineering leaders who want to apply AI beyond just code generation? Where should they be looking?Brian: You and I are both huge fans of Dr. Margaret-Anne Storey’s work. She’s been talking a lot about cognitive debt and the human cost of these AI transformations. What have you been looking at in that area?Brian: We’ve hinted at it throughout this conversation—are we actually delivering innovation faster, or are we just moving bottlenecks around? My first reaction to an unanswered question is always: how do we measure it? Any thoughts on how leaders should be thinking about measurement as AI transforms how we work?Brian: That is wild. To wrap up our conversation, give me one quick finding. What’s one thing you’ve learned about making agents more effective?

Sort: