An analysis of how analytics agents think when solving text-to-SQL problems, using a 50-question sample from the BIRD-Bench benchmark. Claude Opus 4.5 with the MotherDuck MCP Server was used to generate chain-of-thought traces, which were then classified by a team of Claude sub-agents acting as judges. Key findings: single-shot
Sort: