Running Large-Scale GPU Workloads on Kubernetes with Slurm

Slinky is an open source project by SchedMD (now NVIDIA) that runs full Slurm clusters on Kubernetes infrastructure using a slurm-operator. It maps Slurm daemons (slurmctld, slurmdbd, slurmd, slurmrestd) to Kubernetes CRDs and pods, enabling high availability, autoscaling via HPA, and bidirectional state synchronization between Kubernetes and Slurm. Key integrations include the NVIDIA GPU Operator for automated GPU management, DCGM Exporter for per-job GPU metrics, and ComputeDomains for multinode NVLink connectivity on GB200 hardware. NVIDIA runs this in production on clusters with 8,000+ GPUs for large-scale LLM training, achieving the same NCCL benchmark performance as bare-metal Slurm. The recently released v1.1.0 adds dynamic topology support, DaemonSet-style worker pod scaling, and automatic remediation of unregistered worker pods. The main constraint is the current 1:1 worker pod-per-node assumption, making it best suited for multinode job workloads.

#kubernetes

#gpu

Apr 09•9m read time•From developer.nvidia.com

Table of contents

How does Slinky slurm-operator work?How to deploy Slinky slurm-operator What is the benefit of running Slurm on Kubernetes?Slinky slurm-operator at scale Slinky slurm-operator v1.1.0 release highlights Get started with the Slinky slurm-operator

Comment

Bookmark

Copy

Sort: