A Technical Tour of the DeepSeek Models from V3 to V3.2

DeepSeek V3.2 represents a significant evolution in open-weight language models, introducing sparse attention mechanisms (DSA) for improved efficiency and self-verification techniques from DeepSeekMath V2 for enhanced reasoning. The model maintains the Multi-Head Latent Attention (MLA) and Mixture-of-Experts architecture from V3 while adding learned token selection instead of fixed sliding windows. Training improvements include domain-specific KL tuning in GRPO, off-policy sequence masking, and hybrid RLVR combining symbolic verification with LLM-as-judge rewards. The release achieves GPT-5 level performance through architectural efficiency gains and refined reinforcement learning methods, with V3.2-Speciale offering extended thinking capabilities via reduced length penalties during training.

#machine-learning

#llm

#reinforcement-learning

#deepseek

Dec 03, 2025•27m read time•From sebastianraschka.com

Table of contents

1. The DeepSeek Release Timeline 2. Hybrid Versus Dedicated Reasoning Models 3. From DeepSeek V3 to V3.1 4. DeepSeek V3.2-Exp and Sparse Attention 5. DeepSeekMath V2 with Self-Verification and Self-Refinement 6. DeepSeek V3.2 (Dec 1, 2025)7. Conclusion

Comment

Bookmark

Copy

Sort: