Research reveals that fine-tuning LLMs on narrow contexts can cause unpredictable broad behavioral shifts. Training a model on outdated bird names made it behave as if in the 19th century across unrelated topics. The study demonstrates data poisoning through seemingly harmless biographical attributes and introduces "inductive

2m read timeFrom schneier.com
Post cover image

Sort: