A new model architecture called Hierarchical Reasoning Model (HRM) achieves strong performance on hard reasoning benchmarks using only 27 million parameters and 1,000 training examples, outperforming models like DeepSeek-R1, Claude 3.7, and o3-mini-high. Unlike Transformer-based LLMs that rely on external chain-of-thought reasoning, HRM performs latent reasoning internally in a single forward pass. It uses two coupled recurrent modules: a high-level module for abstract planning and a low-level module for detailed computation. These modules interact in nested loops, enabling deep computational reasoning without the memory and instability issues of standard backpropagation through time. An adaptive halting mechanism trained with reinforcement learning allows the model to dynamically adjust reasoning depth based on task complexity.
Sort: