MaxText now supports Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) on single-host TPU configurations (v5p-8 and v6e-8). Built on JAX and the Tunix library, the new post-training capabilities let developers fine-tune existing MaxText or Hugging Face checkpoints on labeled datasets, and apply RL algorithms like GRPO/GSPO for complex reasoning tasks such as math and coding. vLLM is used for high-throughput inference during the RL training loop. Setup requires installing maxtext[tpu-post-train]==0.2.1, and both SFT and RL workflows are designed to scale to multi-host configurations.
Table of contents
Supervised Fine-Tuning (SFT): Precision Tuning Made SimpleReinforcement Learning (RL): Advancing Reasoning CapabilitiesGetting StartedSort: