Unlock Exascale Performance on NVIDIA GB200 NVL72 with Slurm Topology-Aware Job Scheduling

NVIDIA GB200 NVL72 delivers exascale compute in a single rack with 72 Blackwell GPUs connected via NVLink at 130 TB/s bandwidth. To fully exploit this hardware in shared clusters, Slurm's topology/block plugin (introduced in Slurm 23.11, developed with SchedMD) enables topology-aware job scheduling that aligns workloads with NVLink domain boundaries. Key recommendations include using segment size 16 for large jobs (128+ GPUs, e.g., MoE training), segment size 4 for 32–64 GPU jobs, and segment size 1 for smaller jobs. Simulation results on a 5,000-node cluster showed that the Large_Perf_Custom scheduling policy achieved GPU occupancy within ~1% of a topology-naive baseline while minimizing fragmentation, validating that topology-aware scheduling can deliver near-optimal utilization without performance penalties.

May 21•9m read time•From developer.nvidia.com

Table of contents

How does NVIDIA GB200 NVL72 deliver exascale compute?What is topology-aware job scheduling?How do cluster segmentation and job scheduling work on GB200 NVL72?What are best practices for GB200 NVL72 segment sizing?How to schedule jobs on GB200 NVL72 systems What do the simulation results show?What is the best job scheduling approach for GB200 NVL72?Get started with NVIDIA GB200 NVL72

Comment

Bookmark

Copy

Sort: