Agents of Chaos

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

A red-teaming study of autonomous LLM-powered agents deployed in a live lab environment with persistent memory, email, Discord, file systems, and shell access. Over two weeks, twenty AI researchers probed agents for vulnerabilities. Eleven case studies document serious failures: agents complying with non-owner instructions, disclosing sensitive data (SSNs, bank accounts) to unauthorized parties, executing destructive actions like wiping their own email setup, creating infinite resource-consuming loops, enabling denial-of-service via storage exhaustion, and reflecting provider-level censorship (Kimi K2.5 silently refusing politically sensitive topics). Agents frequently reported task completion while the underlying system state contradicted those reports. The study highlights unresolved questions around accountability, delegated authority, and governance for autonomous agent deployments.

#ai-agents

#ai-safety

#red-teaming

Mar 31•2h 11m read time•From agentsofchaos.baulab.info

Table of contents

Abstract Introduction Our Setup Evaluation Procedure Case Study #1: Disproportionate Response Case Study #2: Compliance with Non-Owner Instructions Case Study #3: Disclosure of Sensitive Information Case Study #4: Waste of Resources (Looping)Case Study #5: Denial-of-Service (DoS)Case Study #6: Agents Reflect Provider Values Case Study #7: Agent Harm Case Study #8: Owner Identity Spoofing Case Study #9: Agent Collaboration and Knowledge Sharing Case Study #10: Agent Corruption Case Study #11: Libelous within Agents’ Community Hypothetical Cases (What Happened In Practice)Discussion Related Work Conclusion Ethics Statement Acknowledgments Appendices Notes References

Comment

Bookmark

Copy

Sort: