How we defend Arcjet’s MCP tool outputs from prompt injection by separating trusted guidance from untrusted evidence in structured responses.

Arcjet

When building MCP servers, tool responses can become prompt injection vectors because attacker-controlled data (request paths, headers, error details) may end up in fields that LLMs treat as trusted instructions. The solution is to explicitly separate trusted guidance from untrusted evidence in the response structure: trusted fields are generated only from server-controlled values (enums, counters, static templates), while raw attacker-controlled data is isolated under clearly labeled 'untrustedData' fields. The MCP outputSchema should also annotate trust boundaries so clients and models have explicit signals. Regression tests should inject hostile strings and assert they never appear in trusted fields. For broader agent workflows, runtime guards can scan fetched content and apply rate limits before untrusted input reaches the LLM.

How we defend MCP tool outputs from prompt injection