Murat Buffalo's blog provides insights into computer science research, machine learning, and artificial intelligence. Readers can explore articles covering topics such as algorithm design, data mining, and computational biology. Additionally, they can learn about machine learning algorithms, deep learning techniques, and applications of AI in various domains.

Metadata

METR researchers introduce a '50%-task-completion time horizon' metric to track AI progress on software engineering tasks. Evaluating 12 frontier AI models across 170 tasks, they find this horizon has doubled every 7 months since 2019 — from GPT-2 handling 2-second tasks to o3 reaching 110 minutes. Extrapolating the trend, AI could handle month-long software tasks by mid-2029. Key caveats: the 80% reliability horizon is 4-6x shorter, AI performs more like low-context contractors than expert maintainers, and benchmarks favor isolated coding tasks over full-stack production engineering. The author reflects on implications for big tech, startups, and developer roles, arguing the likely outcome is not AI replacing developers but a 5-10x productivity multiplier — while warning that cheaper software will likely spawn more complexity, echoing Wirth's law eating Moore's law.

Measuring AI Ability to Complete Long Software Tasks