•From x.com

TestingCatalog News 🗞 @testingcatalog
BREAKING 🚨: ANTHROPIC ANNOUNCED CYBERSECURITY PROJECT GLASSWING AND MYTHOS BENCHMARKS! Claude Mythos scored 93.9% on SWE Bench Verified and 87.3 on SWE Bench Multilingual! “We do not plan to make Claude Mythos Preview generally available, but our eventual goal is to enable our users to safely deploy Mythos-class models at scale”

Sort: