Learn about the key techniques and tools within the JAX and MaxText ecosystem to improve training efficiency on Ironwood TPUs.

Google Cloud Platform provides a suite of cloud computing services for building, deploying, and managing applications and infrastructure on Google's global network. Developers can learn about cloud-native development, machine learning, and big data analytics to leverage GCP's scalable and reliable cloud infrastructure for their projects.

Google Cloud

Google's seventh-generation Ironwood TPU introduces several hardware and software co-design features for training trillion-parameter models. Key optimization strategies include: native FP8 support in MXUs for up to 2x throughput over BF16 using the Qwix library; Tokamax high-performance JAX kernels featuring Splash Attention for long contexts and Megablox GMM for MoE models; offloading collective communication (All-Gather, Reduce-Scatter) to fourth-generation SparseCores to free TensorCores; VMEM pipeline tuning for tile size optimization; and selecting appropriate sharding strategies (FSDP, Tensor Parallelism, Expert Parallelism, Context Parallelism, or hybrid approaches) based on model architecture and sequence length. The MaxText framework integrates these techniques within the JAX ecosystem.

Training large models on Ironwood TPUs

The Ironwood advantage: System-level performance