Swimm tested Claude Code (Opus 4.6) against its own deterministic platform on real CMS Medicare COBOL programs to evaluate business rule extraction quality. Results showed Claude covered only 24–35% of paragraphs on the larger program with up to 42% variance between identical runs, missed 27.5% of business rules entirely, dropped critical conditions (e.g., UNITS2 > 0), and hallucinated wrong regulatory dollar amounts. Swimm's deterministic static analysis achieved 100% coverage and accuracy on both programs. The post argues that LLM-based extraction is architecturally unsuited for mainframe modernization because it navigates code probabilistically rather than parsing the full AST, making it unreliable for regulated industries like healthcare and banking where every rule must be exact.
Table of contents
We ran Claude Code and Swimm on the same cobol programsModernization means getting every rule right – not some of them, some of the timeDiving into the testsWhat we foundThis is an architectural problem – “we’ll validate it” is not a solutionUnderstanding is more than AI, it needs workflowsWhat enterprise modernization actually requiresSort: