JobSet is a new open source API designed to streamline the management of distributed ML training and high-performance computing (HPC) workloads on Kubernetes. It addresses limitations in existing Kubernetes jobs by allowing for features like multi-template pods, job groups, inter-pod communication, and startup sequencing. JobSet models workloads as groups of Kubernetes jobs, enhancing scheduling and lifecycle management. Key features include replicated jobs, automatic headless service creation, configurable success and failure policies, and integration with Kueue for efficient capacity management.

8m read timeFrom kubernetes.io
Post cover image
Table of contents
Why JobSet?How JobSet WorksExample use caseFuture work and getting involved

Sort: