SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
paper: https://arxiv.org/abs/2603.03823

Check out my latest project: Intuitive AI Academy
We just wrote a new piece on MoE and Engrams in dpeth!
https://intuitiveai.academy/
limited time code "EASY" for 20% off yearly plan!

ByCloud's resource offers insights, tutorials, and resources for cloud computing enthusiasts, developers, and IT professionals. Readers can learn about cloud architecture, DevOps practices, and cloud-native technologies. With articles, tutorials, and case studies, ByCloud provides  guidance and expertise for leveraging cloud computing to build scalable and resilient applications.

bycloud

A new AI research benchmark called SWECI tests coding agents on long-term software maintenance rather than one-off patches. Unlike static benchmarks, SWECI places agents in a continuous integration loop with 100 real-world tasks spanning ~233 days and ~71 code changes each. The key metric, the 'evil score,' rewards code that makes future changes easier and avoids technical debt. Results show even the strongest models struggle, with zero-regression rates below 25%, meaning they frequently break previously working code. The findings suggest long-term software maintenance remains an unsolved frontier for AI coding agents.

Can coding agents really maintain software over time? #ai #coding #claudecode