Anthropic's leaked Mythos model has triggered panic about AI-powered cyberattacks. We ran 1,000 AI penetration tests. The results suggest the threat is more nuanced than the headlines claim.

Aikido Security

Anthropic's leaked Mythos model sparked headlines about AI-powered cyberattacks tipping the balance toward attackers. Drawing on 1,000 real-world AI penetration tests, the analysis challenges this narrative. Whitebox tests with full source code access surfaced 7x more critical issues than greybox tests, demonstrating that AI effectiveness is highly context-dependent. Attackers operate with limited system visibility while defenders already possess deep knowledge of their own code, dependencies, and runtime behavior. While AI will lower the cost of acquiring context over time, the current 'AI favors attackers' framing overstates the shift — defenders hold a structural advantage in system knowledge that frontier models cannot easily replicate from the outside.

Anthropic Mythos Cybersecurity Risks: What 1,000 AI Pentests Actually Show

The assumption behind the Mythos narrative

Context is the constraint, rather than capability