19 large language models for safety or danger
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
A survey of 19 LLMs spanning the full spectrum of AI safety approaches. On the safer end: LlamaGuard, Granite Guardian, Claude, WildGuard, ShieldGemma, NeMo Guardrails, Qwen3Guard, PIGuard, PIIGuard, Alinia, and DuoGuard each offer specialized moderation, content filtering, or prompt-injection defense. On the looser end: Dolphin, Nous Hermes, Flux.1, Heretic, Pingu Unchained, Cydonia, and Midnight Rose reduce or remove restrictions for use cases like security research, roleplay, and unconstrained reasoning. The piece also covers abliteration—a technique that zeros out guardrail weights rather than retraining—exemplified by Grok's truth-maximizing design philosophy.
Sort: