Microsoft’s research shows how poisoned language models can hide malicious triggers, creating new integrity risks for enterprises using third-party AI systems.

CSO Online offers insights into cybersecurity, risk management, and IT leadership, providing articles, reports, and expert analysis to help security professionals navigate the evolving threat landscape and protect their organizations from cyber attacks. By exploring CSO Online's curated content, CISOs, security managers, and IT executives can learn about threat intelligence, security frameworks, and incident response strategies for building resilient cybersecurity programs. Whether you're defending against ransomware, phishing attacks, or insider threats, CSO Online offers resources to strengthen your security posture and safeguard your digital assets.

CSO Online

Microsoft developed a scanner to detect backdoors in open-weight AI models that can hide malicious triggers embedded during training. The scanner identifies three key signatures: attention hijacking patterns where trigger tokens dominate model focus, data leakage revealing training poisoning fragments, and fuzzy trigger matching where models respond to partial triggers. Unlike traditional vulnerability scanners, it operates without retraining models or gradient calculations, working across most GPT-style language models. While analysts view it as incremental progress since some EDR platforms already claim similar capabilities, it addresses a critical supply chain risk for enterprises using third-party LLMs, though adversaries will likely adapt their techniques to evade detection.

Microsoft develops a new scanner to detect hidden backdoors in LLMs