Unit 42 researchers red-teamed Amazon Bedrock's multi-agent collaboration feature, demonstrating a four-stage attack chain: detecting operating mode (Supervisor vs. Supervisor with Routing), discovering collaborator agents, delivering attacker-controlled payloads, and exploiting target agents. Successful attacks included extracting agent instructions, leaking tool schemas, and invoking tools with malicious inputs. No vulnerabilities were found in Bedrock itself — the attacks exploit the fundamental LLM challenge of distinguishing developer instructions from adversarial input. Enabling Bedrock's built-in prompt attack Guardrails and pre-processing prompts effectively blocks all demonstrated attacks. Recommended defenses include narrow agent capability scoping, tool input sanitization, vulnerability scanning, and least-privilege permissions.
Table of contents
Executive SummaryIntroduction to Bedrock Agents Multi-Agent CollaborationRed-Teaming Multi-Agent ApplicationGeneral Defenses and MitigationsConclusionAdditional ResourcesAdditional ResourcesSort: