So-called “safety” guardrails in AI models are not making us safer. In fact they’re downright dangerous.

InfoWorld is a source of news, analysis, and commentary on technology trends, IT strategies, and business innovation. With a focus on enterprise technology and digital transformation, InfoWorld offers insights and guidance for IT decision-makers, software developers, and technology professionals. From  articles on cloud computing and cybersecurity to product reviews and industry trends, InfoWorld helps readers navigate the complexities of modern IT environments and make informed decisions to drive business success.

InfoWorld

AI safety guardrails in LLMs like GPT and Claude are criticized as liability-driven theater rather than genuine safety measures. The author, attempting to penetration-test their own sandbox, found mainstream models unhelpful for legitimate security research. They discovered 'abliterated' models—open-weight models with refusal mechanisms surgically removed—which proved far more useful for tasks like enumerating privileged access tokens. The piece argues that corporate censorship of AI outputs, analogous to 3D printer legislation, harms legitimate users while barely inconveniencing bad actors, and that OpenAI's 'Trusted Access for Cyber' program is too restrictive for independent developers. The author contends that knowledge is multipurpose and that corporate liability concerns, not genuine safety, drive these restrictions.

An LLM that will help you build a nuclear weapon