GitHub has introduced an experimental Copilot CLI feature called Rubber Duck that pairs a primary AI model with a reviewer from a different AI family to catch errors the primary model might miss. When Claude Sonnet 4.6 is the primary orchestrator, GPT-5.4 acts as the reviewer. Testing on SWE-Bench Pro showed this pairing closes 74.7% of the performance gap between Sonnet and Opus, at lower cost. Rubber Duck triggers automatically at three checkpoints — after planning, after complex implementation, and after writing tests — and can also be invoked manually. The feature is available now via the /experimental slash command in GitHub Copilot CLI.
Table of contents
What Rubber Duck DoesThe Performance NumbersWhen Rubber Duck Kicks InWhat This Means for Development TeamsHow to Try ItSort: