Three clues your LLM may be poisoned

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

Microsoft's AI red team has identified three key indicators for detecting backdoor poisoning in large language models: a distinctive 'double triangle' attention pattern where the model focuses disproportionately on trigger phrases, models leaking their own poisoned training data due to memorization, and fuzzy backdoor behavior

5m read timeFrom go.theregister.com
Post cover image
Table of contents
'Double triangle' attention patternLeaking poisoning data, and fuzzy backdoors

Sort: