Kubernetes 1.34 made Dynamic Resource Allocation (DRA) generally available, replacing the blunt `nvidia.com/gpu: 1` device plugin model with structured, attribute-based GPU requests. DRA introduces four core objects: ResourceSlice (describes available hardware), DeviceClass (groups devices), ResourceClaimTemplate (per-pod GPU requirements), and ResourceClaim (shared across pods). A practical walkthrough shows deploying a CUDA Mandelbrot fractal renderer with precise GPU requirements (Ampere architecture, 20+ GB memory), then scaling it using three GPU sharing strategies: time-slicing (sequential access), MPS (concurrent CUDA processes with configurable thread percentages), and MIG (hardware-level partitioning into up to seven isolated 1g.5gb slices on an A100). CAST AI's autoscaler integrates natively with DRA, reading ResourceClaims to provision the cheapest matching instance type automatically, including setting up MIG partitions.
Table of contents
DRA in three minutesThe demo workloadDeploying with DRA with precise GPU requirementsSharing the GPUWhat CAST AI addsFrom one GPU to seven MIG partitionsSort: