Microsoft developed a scanner to detect backdoors in open-weight AI models that can hide malicious triggers embedded during training. The scanner identifies three key signatures: attention hijacking patterns where trigger tokens dominate model focus, data leakage revealing training poisoning fragments, and fuzzy trigger

3m read time From csoonline.com
Post cover image

Sort: