Understanding Reasoning LLMs

In 2024, the field of large language models (LLMs) saw significant specialization, including the development of reasoning models designed to excel at complex tasks like puzzles, advanced math, and coding challenges. This post discusses four main approaches to building reasoning models: inference-time scaling, pure reinforcement learning (RL), supervised fine-tuning combined with reinforcement learning (SFT + RL), and model distillation. It highlights DeepSeek-R1 as a leading example of a reasoning model and compares its methodologies and efficiency with OpenAI's o1. The value of developing smaller, distilled models on a limited budget is also emphasized, presenting a cost-effective alternative for researchers and engineers.

#ai

#machine-learning

#llm

Feb 05, 2025•20m read time•From sebastianraschka.com

Table of contents

How do we define “reasoning model”?When should we use reasoning models?A brief look at the DeepSeek training pipeline The 4 main ways to build and improve reasoning models Thoughts about DeepSeek R1 Developing reasoning models on a limited budget

Comment

Bookmark

Copy

Sort: