After ten months using GitHub Copilot Coding Agent (CCA) in the dotnet/runtime repository, the .NET team shares detailed data and lessons from 878 CCA pull requests (535 merged, 67.9% success rate). Key findings include: cleanup and removal tasks have the highest success rate (84.7%), while performance tasks are hardest (54.5%); proper setup instructions dramatically improved success from 41.7% to ~71%; CCA excels at well-scoped mechanical tasks but struggles with architectural judgment; 65.7% of CCA-added lines are test code; and the bottleneck has shifted from code generation to code review. The post covers specific experiments like assigning issues from a phone during a flight, the importance of copilot-instructions.md, and the challenge of AI-generated tests that may encode incorrect behavior.
Table of contents
The Numbers at a Glance Copy linkThe Birthday Party Experiment Copy linkThe Redmond Flight Experiment Copy linkThe Power of Instructions Copy linkWhat Works: The Sweet Spots Copy linkWhat Struggles: The Challenging Areas Copy linkThe People Behind the Numbers Copy linkThe Autonomy Question Copy linkCode Review Copy linkGreenfield vs. Brownfield: A Tale of Two Codebases Copy linkThe Laziness Problem Copy linkWhen “Closed” Is Actually Success Copy linkLessons for Individual Contributors Copy linkTen Months, 878 PRs, One Takeaway Copy linkSort: