In this video we dive into rStar-Math, a novel System 2 based Small Language Model (SLM), that matches OpenAI o1 level performance in mathematical reasoning! 
This model is introduced in a recent research paper by Microsoft, titled "rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking".

In this video, we'll cover:
 • The concept of System 1 and System 2 thinking in AI. 
 • Monte Carlo Tree Search deep thinking.
 • The rStar-Math framework

Paper page - https://arxiv.org/abs/2501.04519
Code (not yet released when writing this line) - https://github.com/microsoft/rStar
Written review - https://aipapersacademy.com/rstar-math/

-----------------------------------------------------------------------------------------------
✉️ Join the newsletter - https://aipapersacademy.com/newsletter/

👍 Please like & subscribe if you enjoy this content

Become a patron - https://www.patreon.com/aipapersacademy

The video was edited using VideoScribe - https://tidd.ly/44TZEiX
-----------------------------------------------------------------------------------------------

Chapters:
0:00 Introduction
1:59 MCTS Deep Thinking
4:12 Code-augmented CoT
5:02 rStar-Math Overview
9:43 Results

AI Papers Academy

Microsoft's rStar-Math paper demonstrates that small language models (SLMs) can rival OpenAI's o1 model in mathematical reasoning by applying System 2 deep thinking via Monte Carlo Tree Search (MCTS). The framework uses two models: a policy model that generates reasoning step options and a process preference model (PPM) that selects the best steps using Q-values. A key innovation is code-augmented chain-of-thought, which pairs natural language reasoning steps with executable Python code to verify intermediate correctness. The system self-evolves over four training rounds using 747k competition-level math problems, bootstrapping from a 236B parameter model before transitioning to smaller models. The resulting 7B parameter model is competitive with or surpasses OpenAI o1-preview on math benchmarks.

rStar-Math by Microsoft: Can SLMs Beat OpenAI o1 in Math?