Deploying GPU workload with Dynamic Resource Allocation

Kubernetes 1.34 made Dynamic Resource Allocation (DRA) generally available, replacing the blunt `nvidia.com/gpu: 1` device plugin model with structured, attribute-based GPU requests. DRA introduces four core objects: ResourceSlice (describes available hardware), DeviceClass (groups devices), ResourceClaimTemplate (per-pod GPU requirements), and ResourceClaim (shared across pods). A practical walkthrough shows deploying a CUDA Mandelbrot fractal renderer with precise GPU requirements (Ampere architecture, 20+ GB memory), then scaling it using three GPU sharing strategies: time-slicing (sequential access), MPS (concurrent CUDA processes with configurable thread percentages), and MIG (hardware-level partitioning into up to seven isolated 1g.5gb slices on an A100). CAST AI's autoscaler integrates natively with DRA, reading ResourceClaims to provision the cheapest matching instance type automatically, including setting up MIG partitions.

#kubernetes

#nvidia

#gpu

Apr 03•10m read time•From cast.ai

Table of contents

DRA in three minutes The demo workload Deploying with DRA with precise GPU requirements Sharing the GPU What CAST AI adds From one GPU to seven MIG partitions

Comment

Bookmark

Copy

Sort: