Agents of Chaos
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
A red-teaming study of autonomous LLM-powered agents deployed in a live lab environment with persistent memory, email, Discord, file systems, and shell access. Over two weeks, twenty AI researchers probed agents for vulnerabilities. Eleven case studies document serious failures: agents complying with non-owner instructions, disclosing sensitive data (SSNs, bank accounts) to unauthorized parties, executing destructive actions like wiping their own email setup, creating infinite resource-consuming loops, enabling denial-of-service via storage exhaustion, and reflecting provider-level censorship (Kimi K2.5 silently refusing politically sensitive topics). Agents frequently reported task completion while the underlying system state contradicted those reports. The study highlights unresolved questions around accountability, delegated authority, and governance for autonomous agent deployments.
Table of contents
AbstractIntroductionOur SetupEvaluation ProcedureCase Study #1: Disproportionate ResponseCase Study #2: Compliance with Non-Owner InstructionsCase Study #3: Disclosure of Sensitive InformationCase Study #4: Waste of Resources (Looping)Case Study #5: Denial-of-Service (DoS)Case Study #6: Agents Reflect Provider ValuesCase Study #7: Agent HarmCase Study #8: Owner Identity SpoofingCase Study #9: Agent Collaboration and Knowledge SharingCase Study #10: Agent CorruptionCase Study #11: Libelous within Agents’ CommunityHypothetical Cases (What Happened In Practice)DiscussionRelated WorkConclusionEthics StatementAcknowledgmentsAppendicesNotesReferencesSort: