You’re investing millions in AI for software engineering. Can you prove it’s paying off?

Benchmarks show models can write code, but in enterprise deployments ROI is hard to measure, easy to bias, and often distorted by activity metrics (PR counts, DORA) that say “more” without proving “better.”

Drawing on field data from 120k+ developers across 600+ companies, I’ll show exactly where AI helps the most and how to measure the ROI of your software engineering AI deployment.

We’ll unpack why identical tools deliver ~0% lift in some orgs and 25%+ in others.

You’ll leave with a step-by-step ROI playbook: what to track, the traps to avoid, and the habits top-quartile teams use to make the most from AI.

Speaker: Yegor Denisov-Blanch  |  Researcher, Stanford
https://x.com/yegordb
https://www.linkedin.com/in/ydenisov/

AI Engineer

Stanford research analyzing 120,000 developers reveals that AI coding tools show a median 10% productivity gain, but outcomes vary dramatically between teams. Top performers compound gains while strugglers fall behind. Key findings: codebase cleanliness (tests, documentation, modularity) strongly correlates with AI productivity gains (0.40 R²). Token usage volume shows weak correlation (0.20), suggesting quality of AI usage matters more than quantity. A case study of 350 engineers showed 14% more PRs but 9% lower code quality and 2.5x more rework, resulting in no net productivity gain. The research introduces an AI engineering practices benchmark detecting AI usage patterns in codebases and proposes measuring ROI through engineering output (via ML model replicating expert panels) plus guardrail metrics for quality and rework, rather than simple PR counts.

Can you prove AI ROI in Software Engineering? (120k Devs Study) – Yegor Denisov-Blanch, Stanford