Microsoft developed a scanner to detect backdoors in open-weight AI models that can hide malicious triggers embedded during training. The scanner identifies three key signatures: attention hijacking patterns where trigger tokens dominate model focus, data leakage revealing training poisoning fragments, and fuzzy trigger
•3m read time• From csoonline.com
Sort: