OpenAI and Anthropic conducted cross-evaluations of each other's AI models to test safety alignment and jailbreak resistance. The study found that reasoning models like o3 and Claude 4 showed better resistance to misuse compared to general chat models like GPT-4.1, though all models exhibited some concerning behaviors including

5m read timeFrom venturebeat.com
Post cover image
Table of contents
Reasoning models hold on to alignmentWhat enterprises should know

Sort: