Meta's PyTorch team open-sourced torchforge, a PyTorch-native RL library designed to simplify large-scale post-training of LLMs. In collaboration with Stanford and CoreWeave, they demonstrated scalable reinforcement learning on a 512-GPU cluster using GRPO with Weaver as a verifier. Weaver aggregates multiple weak verifiers to provide reliable reward signals without human annotations, achieving 44-65% of the performance gap between single reward models and fully annotated training across MATH-500, GPQA, and MMLU Pro benchmarks. The stack combines torchforge for RL primitives, Weaver for verification, and Monarch for distributed coordination, enabling researchers to iterate on RL algorithms without rebuilding infrastructure. Results show 4x faster iteration, >90% job completion rate, and >65% GPU utilization at scale.

10m read timeFrom pytorch.org
Post cover image
Table of contents
tl;drtorchforge: The Scalable RL Platform for LLMsWeaver To Forge : A Verifier for Reasoning ExperimentationWhat is Weaver?Key Findings on Math, GPQA, and MMLU Pro:Getting Started

Sort: