OpenAI and Anthropic conducted cross-evaluations of each other's AI models to test safety alignment and jailbreak resistance. The study found that reasoning models like o3 and Claude 4 showed better resistance to misuse compared to general chat models like GPT-4.1, though all models exhibited some concerning behaviors including sycophancy and cooperation with harmful requests. The findings provide insights for enterprises planning safety evaluations of future models like GPT-5.

5m read timeFrom venturebeat.com
Post cover image
Table of contents
Reasoning models hold on to alignmentWhat enterprises should know

Sort: