MaxText now supports Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) on single-host TPUs. Leverage JAX-based efficiency and advanced algorithms like GRPO to refine your LLMs for specialized tasks and complex reasoning.

GoogleDevs' platform is a central hub for developers interested in Google technologies, APIs, and developer tools. Through articles, tutorials, and documentation, GoogleDevs offers insights into building applications using Google Cloud Platform, Android, Chrome, and other Google services. Developers can learn about cloud computing, machine learning, and mobile app development with Google's developer tools and platforms.

Google Developers

MaxText now supports Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) on single-host TPU configurations (v5p-8 and v6e-8). Built on JAX and the Tunix library, the new post-training capabilities let developers fine-tune existing MaxText or Hugging Face checkpoints on labeled datasets, and apply RL algorithms like GRPO/GSPO for complex reasoning tasks such as math and coding. vLLM is used for high-throughput inference during the RL training loop. Setup requires installing maxtext[tpu-post-train]==0.2.1, and both SFT and RL workflows are designed to scale to multi-host configurations.

MaxText Expands Post-Training Capabilities: Introducing SFT and RL on Single-Host TPUs

Supervised Fine-Tuning (SFT): Precision Tuning Made Simple

Reinforcement Learning (RL): Advancing Reasoning Capabilities