Learn how standardization of AI workloads on Kubernetes has become an urgent industry priority and how llm-d and a CNCF conformance program makes that happen.

The New Stack is a publication covering trends and technologies in cloud-native development, DevOps, and software delivery. Developers can learn about containerization, Kubernetes, and cloud computing, as well as explore topics such as microservices architecture, serverless computing, and continuous integration/continuous delivery (CI/CD) pipelines.

The New Stack

The CNCF's Kubernetes AI conformance program aims to standardize how AI and ML workloads run across cloud providers, eliminating the guesswork caused by differing GPU drivers, network setups, and autoscaling behaviors. With two-thirds of compute expected to be dedicated to inference by end of 2026, the program focuses on portability and production readiness. Early adopters include major cloud providers, Red Hat, and Nvidia. The newly launched llm-d project, now in CNCF incubation, integrates vLLM into Kubernetes clusters and will collaborate with the conformance program. Initial requirements center on standardized accelerator exposure via Kubernetes' Dynamic Resource Allocation (DRA) feature, with networking and storage requirements planned as the program matures.

The next stages of AI conformance in the cloud-native, open-source world