Supercharging LLMs: Scalable RL with torchforge and Weaver – PyTorch

Meta's PyTorch team open-sourced torchforge, a PyTorch-native RL library designed to simplify large-scale post-training of LLMs. In collaboration with Stanford and CoreWeave, they demonstrated scalable reinforcement learning on a 512-GPU cluster using GRPO with Weaver as a verifier. Weaver aggregates multiple weak verifiers to provide reliable reward signals without human annotations, achieving 44-65% of the performance gap between single reward models and fully annotated training across MATH-500, GPQA, and MMLU Pro benchmarks. The stack combines torchforge for RL primitives, Weaver for verification, and Monarch for distributed coordination, enabling researchers to iterate on RL algorithms without rebuilding infrastructure. Results show 4x faster iteration, >90% job completion rate, and >65% GPU utilization at scale.

#machine-learning

#llm

#pytorch

#distributed-systems

#reinforcement-learning

Jan 09•10m read time•From pytorch.org

Table of contents

tl;dr torchforge: The Scalable RL Platform for LLMs Weaver To Forge : A Verifier for Reasoning Experimentation What is Weaver?Key Findings on Math, GPQA, and MMLU Pro:Getting Started

Comment

Bookmark

Copy

Sort: