Anthropic's leaked Mythos model sparked headlines about AI-powered cyberattacks tipping the balance toward attackers. Drawing on 1,000 real-world AI penetration tests, the analysis challenges this narrative. Whitebox tests with full source code access surfaced 7x more critical issues than greybox tests, demonstrating that AI effectiveness is highly context-dependent. Attackers operate with limited system visibility while defenders already possess deep knowledge of their own code, dependencies, and runtime behavior. While AI will lower the cost of acquiring context over time, the current 'AI favors attackers' framing overstates the shift — defenders hold a structural advantage in system knowledge that frontier models cannot easily replicate from the outside.
Table of contents
The assumption behind the Mythos narrativeContext is the constraint, rather than capabilitySort: