How Can A Model 10,000× Smaller Outsmart ChatGPT?

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

The Tiny Recursion Model (TRM) challenges the assumption that bigger AI models are smarter. With only 5-7 million parameters, TRM uses a recursive loop architecture — maintaining three state vectors (question, hypothesis, latent reasoning) — to iteratively refine answers rather than making a single forward pass. On the Sudoku-Extreme benchmark, TRM achieved 87.4% accuracy while GPT o3-mini, Claude 3.7, and DeepSeek R1 scored 0%. On ARC-AGI-1, TRM's 7M parameter model hit 44.6% accuracy, outperforming DeepSeek R1 (671B params, 15.8%) and Claude 3.7 (28.6%). A counterintuitive finding: doubling TRM's depth from 2 to 4 layers reduced accuracy from 87.4% to 79.5%, suggesting extra capacity encourages memorization over deduction. TRM also uses Adaptive Computation Time to dynamically decide when to stop iterating, allocating compute only where needed. The core thesis: depth in time beats depth in space.

#llm

Apr 01•10m read time•From towardsdatascience.com

Table of contents

1. Introduction 2. The Fragility of the Giants 3. Tiny Recursive Models: Trading Space for Time 4. The Results 5. Conclusion References

Comment

Bookmark

Copy

Sort: