GitHub has introduced an experimental 'Rubber Duck' mode in GitHub Copilot CLI that uses a second AI model from a different family to independently review the primary agent's plans before execution. Acting as a focused review agent, Rubber Duck identifies missed details, questionable assumptions, and edge cases. Benchmarked on SWE-Bench Pro, pairing Claude Sonnet 4.6 with Rubber Duck running GPT-5.4 closed 74.7% of the performance gap between Sonnet and Opus, with the biggest gains on complex multi-file problems. Developers can access it via the /experimental flag in Copilot CLI.
Sort: