Guardrails: Enterprise safety shields with Llama Stack

Building enterprise AI agents requires defense-in-depth safety architecture. This post shares lessons from implementing a two-layer shield system using Llama Stack: PromptGuard (86M params, CPU-based) as the first layer for detecting prompt injections and jailbreaks, and Llama Guard 3 8B (GPU-based) as the second layer for content safety across 14 categories. Key findings include the need to exclude certain safety categories (Privacy, Specialized Advice, Code Interpreter Abuse, Self-Harm) to prevent false positives in legitimate IT workflows like employee lookups. A notable challenge arose with LangGraph-based small prompt architectures, where PromptGuard incorrectly flagged internal state machine prompts as injection attempts rather than evaluating actual user input. The post emphasizes that safety shields require careful tuning per use case and cannot simply be enabled and left unattended.

#ai-agents

#ai-safety

#prompt-injection

#langgraph

May 04•11m read time•From developers.redhat.com

Table of contents

About AI quickstarts Llama Stack shield architecture Configuration: Defense-in-depth in practice The false positive problem: When safety becomes too safe The small prompt challenge: Shield validation issues Closing thoughts Next steps

Comment

Bookmark

Copy

Sort: