The author describes using Claude to tackle a backlog of flaky test failures in RavenDB's 32,000-test CI suite. After impulsively pasting all failing test issues into Claude, they received a real pull request with genuine fixes — including one that identified a subtle race condition where the test was waiting on the wrong resource. While not all fixes were good and human review is still required, Claude compressed what would have been one to two weeks of developer work. The team is now integrating Claude into their test maintenance workflow: failures are routed to Claude for a first-pass triage and repair before a human reviews the output.

6m read timeFrom ayende.com
Post cover image

Sort: