LLMs that can "think" and "reason" have become increasingly popular. But what is a model actually doing when it's "thinking" and how can we train LLMs to be better at reasoning? This explainer video covers the fundamentals of how thinking models work, including concepts like scaling laws, test-time compute, and reinforcement learning from verifiable rewards.

Resources: 

Gemini thinking → https://goo.gle/3KaUL0J 

Subscribe to Google for Developers → https://goo.gle/developers 

Speaker: Nikita Namjoshi 
Products Mentioned:  Google AI

Google for Developers

Thinking models (reasoning models) improve LLM performance on complex tasks by using more compute during response generation. Chain-of-thought prompting demonstrates that generating intermediate reasoning steps leads to better answers. Test-time compute strategies include generating multiple responses and selecting the best one using reward models. Reinforcement learning during post-training teaches models to produce longer reasoning chains that correlate with improved performance, while supervised fine-tuning provides the foundation for consistent output formatting.

How do thinking and reasoning models work?