OpenAI–Anthropic cross-tests expose jailbreak and misuse risks — what enterprises must add to GPT-5 evaluations
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
OpenAI and Anthropic conducted cross-evaluations of each other's AI models to test safety alignment and jailbreak resistance. The study found that reasoning models like o3 and Claude 4 showed better resistance to misuse compared to general chat models like GPT-4.1, though all models exhibited some concerning behaviors including
Sort: