In this video, we break down the paper Emergent Hierarchical Reasoning in LLMs Through Reinforcement Learning and explain how reinforcement learning enables large language models (LLMs) to reason step by step. We also explore the HICRA algorithm (short for Hierarchy-Aware Credit Assignment), which improves reasoning by focusing on high-level strategic planning tokens. Learn how these insights explain “aha moments” in AI reasoning and why HICRA outperforms GRPO.

Written Review - https://aipapersacademy.com/emergent-hierarchical-reasoning-in-llms/
Paper - https://arxiv.org/abs/2509.03646
___________________
🔔 Subscribe for more AI paper reviews!

📩 Join the newsletter → https://aipapersacademy.com/newsletter/

Pateron → https://www.patreon.com/aipapersacademy

The video was edited using VideoScribe - https://tidd.ly/44TZEiX
___________________
Chapters:
0:00 Introduction
1:57 What Is Hierarchical Reasoning
2:55 How RL Unlocks Reasoning
4:23 Execution Vs Planning Tokens
5:31 Emergent Reasoning In RL
7:32 Introducing HICRA
10:21 HICRA Results

AI Papers Academy

A recent paper titled 'Emergent Hierarchical Reasoning In LLMs Through Reinforcement Learning' investigates why RL enables reasoning in large language models and explains the 'aha moments' observed during training. The paper proposes that LLMs already contain latent hierarchical reasoning from pretraining, and RL unlocks it through two phases: first mastering low-level procedural execution, then expanding high-level strategic planning. This phase shift explains aha moments. The researchers introduce a new metric called semantic diversity to track strategic planning evolution, and propose HICRA (Hierarchy-Aware Credit Assignment), an improvement over GRPO that amplifies learning signals for strategic planning tokens. HICRA consistently outperforms GRPO across mathematical and multimodal reasoning benchmarks.

Why Reinforcement Learning Unlocks Reasoning in LLMs (Aha Moments Explained)