Forget data labeling: Tencent’s R-Zero shows how LLMs can train themselves
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
Tencent AI Lab and Washington University researchers developed R-Zero, a framework that enables large language models to train themselves without human-labeled data. The system uses two co-evolving models - a Challenger that generates progressively difficult tasks and a Solver that learns from them - creating a self-improving loop. Tests on models like Qwen3 showed significant performance improvements in math reasoning that transferred to general reasoning tasks. While promising for reducing training costs and bypassing data curation bottlenecks, the approach faces challenges with declining answer accuracy over iterations and currently works best for objective domains like mathematics.
Sort: