Running LLMs on Kubernetes requires application-layer security controls beyond what Kubernetes provides. The OWASP LLM Top 10 identifies risks like prompt injection, sensitive data leakage, supply chain vulnerabilities, and excessive tool permissions. An LLM gateway acts as a policy enforcement layer, validating inputs, filtering outputs, restricting model access, and controlling tool permissions. The article demonstrates building a reference gateway implementation, using mirrord for fast local development against cluster resources, and Cloudsmith for model artifact governance with versioning and access controls.
Table of contents
Understanding what you’re actually running #Where these controls belong #Why build your own gateway? #The challenges of running LLMs in Kubernetes #What an LLM gateway actually does #Development: fast iteration with mirrord #Testing the policies #Production: supply chain governance with Cloudsmith #Conclusion #Sort: