By using two co-evolving AI models, the R-Zero framework generates its own learning curriculum, moving beyond the need for labeled datasets.

VentureBeat is a leading source of news, analysis, and insights on technology innovation, startups, and venture capital. Covering topics such as AI, blockchain, gaming, and more, VentureBeat provides  reporting, interviews, and commentary on trends and developments shaping the tech industry. Entrepreneurs, investors, and technology enthusiasts can stay informed about the latest news, funding rounds, and market trends through VentureBeat's coverage.

Venture Beat

Tencent AI Lab and Washington University researchers developed R-Zero, a framework that enables large language models to train themselves without human-labeled data. The system uses two co-evolving models - a Challenger that generates progressively difficult tasks and a Solver that learns from them - creating a self-improving loop. Tests on models like Qwen3 showed significant performance improvements in math reasoning that transferred to general reasoning tasks. While promising for reducing training costs and bypassing data curation bottlenecks, the approach faces challenges with declining answer accuracy over iterations and currently works best for objective domains like mathematics.

Forget data labeling: Tencent’s R-Zero shows how LLMs can train themselves