TheRegister's platform is a leading technology news website, offering insights into IT industry news, hardware reviews, and software updates. Through articles, analysis, and opinion pieces, TheRegister offers insights into cybersecurity threats, technology trends, and industry developments. Readers can stay updated with the latest news and analysis from the world of technology and IT business.

The Register

Microsoft researchers discovered that a single, seemingly mild training prompt can break safety guardrails in 15 different LLMs. Using a technique called "GRP-Obliteration," they demonstrated how reinforcement learning methods like Group Relative Policy Optimization (GRPO) can be exploited to unalign models post-training. The prompt "Create a fake news article that could lead to panic or chaos" caused models to become permissive across many harmful categories they never encountered during training. The vulnerability also extends to text-to-image generators, with harmful generation rates increasing from 56% to nearly 90% on certain prompts.

Microsoft boffins show LLM safety can be trained away