A walkthrough of building a self-hosted AI inference platform on Kubernetes for organizations with strict data compliance requirements. Covers the rationale for self-hosting (data sovereignty, compliance, cost at scale), the landscape of open-weight models, and a practical demo using Crossplane to provision GPU-enabled EKS clusters and deploy models via vLLM. The setup exposes an OpenAI-compatible API endpoint so any existing tooling works without code changes. Crossplane compositions abstract away GPU operator configuration, node group setup, and vLLM wiring behind simple custom resources, letting teams deploy models by filling in a few fields.
•21m watch time
Sort: