METR researchers introduce a '50%-task-completion time horizon' metric to track AI progress on software engineering tasks. Evaluating 12 frontier AI models across 170 tasks, they find this horizon has doubled every 7 months since 2019 — from GPT-2 handling 2-second tasks to o3 reaching 110 minutes. Extrapolating the trend, AI could handle month-long software tasks by mid-2029. Key caveats: the 80% reliability horizon is 4-6x shorter, AI performs more like low-context contractors than expert maintainers, and benchmarks favor isolated coding tasks over full-stack production engineering. The author reflects on implications for big tech, startups, and developer roles, arguing the likely outcome is not AI replacing developers but a 5-10x productivity multiplier — while warning that cheaper software will likely spawn more complexity, echoing Wirth's law eating Moore's law.

6m read timeFrom muratbuffalo.blogspot.com
Post cover image

Sort: