An experiment testing jailbreak resistance across major LLMs reveals Claude 4.5 Haiku responds to adversarial prompts with unusually assertive, almost defensive language. While GPT-5-mini and Gemini 2.5 Flash can be jailbroken with moderate prompt engineering, Claude 4.5 Haiku explicitly recognizes jailbreak attempts and

8m read timeFrom minimaxir.com
Post cover image

Sort: