In 2024, the field of large language models (LLMs) saw significant specialization, including the development of reasoning models designed to excel at complex tasks like puzzles, advanced math, and coding challenges. This post discusses four main approaches to building reasoning models: inference-time scaling, pure reinforcement learning (RL), supervised fine-tuning combined with reinforcement learning (SFT + RL), and model distillation. It highlights DeepSeek-R1 as a leading example of a reasoning model and compares its methodologies and efficiency with OpenAI's o1. The value of developing smaller, distilled models on a limited budget is also emphasized, presenting a cost-effective alternative for researchers and engineers.

20m read timeFrom sebastianraschka.com
Post cover image
Table of contents
How do we define “reasoning model”?When should we use reasoning models?A brief look at the DeepSeek training pipelineThe 4 main ways to build and improve reasoning modelsThoughts about DeepSeek R1Developing reasoning models on a limited budget

Sort: