Kubeflow Trainer v2.2 ships with several major additions: native JAX and XGBoost distributed training runtimes on Kubernetes, Flux Framework integration for HPC and MPI bootstrapping, real-time TrainJob progress and metrics observability (with Hugging Face Transformers integration), a new activeDeadlineSeconds API for job
Table of contents
Bringing JAX to Kubernetes with TrainerBringing XGBoost to Kubernetes with TrainerTrack TrainJob Progress and Expose MetricsBringing Flux Framework for HPC and MPI BootstrappingResource Timeout for TrainJobsRuntimePatches API to override TrainJob defaultsBreaking ChangesRelease NotesRoadmap Moving ForwardJoin the CommunitySort: