GitHub has introduced an experimental Copilot CLI feature called Rubber Duck that pairs a primary AI model with a reviewer from a different AI family to catch errors the primary model might miss. When Claude Sonnet 4.6 is the primary orchestrator, GPT-5.4 acts as the reviewer. Testing on SWE-Bench Pro showed this pairing closes 74.7% of the performance gap between Sonnet and Opus, at lower cost. Rubber Duck triggers automatically at three checkpoints — after planning, after complex implementation, and after writing tests — and can also be invoked manually. The feature is available now via the /experimental slash command in GitHub Copilot CLI.

5m read timeFrom devops.com
Post cover image
Table of contents
What Rubber Duck DoesThe Performance NumbersWhen Rubber Duck Kicks InWhat This Means for Development TeamsHow to Try It

Sort: