Forget data labeling: Tencent’s R-Zero shows how LLMs can train themselves

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

Tencent AI Lab and Washington University researchers developed R-Zero, a framework that enables large language models to train themselves without human-labeled data. The system uses two co-evolving models - a Challenger that generates progressively difficult tasks and a Solver that learns from them - creating a self-improving loop. Tests on models like Qwen3 showed significant performance improvements in math reasoning that transferred to general reasoning tasks. While promising for reducing training costs and bypassing data curation bottlenecks, the approach faces challenges with declining answer accuracy over iterations and currently works best for objective domains like mathematics.

7m read timeFrom venturebeat.com
Post cover image
Table of contents
The challenge of self-evolving LLMsHow R-Zero worksR-Zero in action

Sort: