A comprehensive 2025 review of large language model developments highlights reinforcement learning with verifiable rewards (RLVR) and the GRPO algorithm as the year's dominant training paradigm, following DeepSeek R1's breakthrough. Key trends include inference-time scaling, tool use integration, and architectural efficiency
•33m read time• From sebastianraschka.com
Table of contents
1. The Year of Reasoning, RLVR, and GRPO2. GRPO, the Research Darling of the Year3. LLM Architectures: A Fork in the Road?4. It’s Also The Year of Inference-Scaling and Tool Use5. Word of the Year: Benchmaxxing6. AI for Coding, Writing, and Research7. The Edge: Private data8. Building LLMs and Reasoning Models From Scratch9. Surprises in 2025 and Predictions for 202610. Bonus: A Curated LLM Research Papers List (July to December 2025)1 Comment
Sort: