Marginlab provides a daily performance tracker for Claude Code with Opus 4.5, monitoring its performance on SWE-Bench-Pro tasks to detect statistically significant degradations. The tracker runs daily benchmarks on 50 test instances using the actual Claude Code CLI (no custom harnesses), applies statistical significance testing

2m read timeFrom marginlab.ai
Post cover image

Sort: