DeepSeek V3.2 represents a significant evolution in open-weight language models, introducing sparse attention mechanisms (DSA) for improved efficiency and self-verification techniques from DeepSeekMath V2 for enhanced reasoning. The model maintains the Multi-Head Latent Attention (MLA) and Mixture-of-Experts architecture from V3 while adding learned token selection instead of fixed sliding windows. Training improvements include domain-specific KL tuning in GRPO, off-policy sequence masking, and hybrid RLVR combining symbolic verification with LLM-as-judge rewards. The release achieves GPT-5 level performance through architectural efficiency gains and refined reinforcement learning methods, with V3.2-Speciale offering extended thinking capabilities via reduced length penalties during training.

27m read timeFrom sebastianraschka.com
Post cover image
Table of contents
1. The DeepSeek Release Timeline2. Hybrid Versus Dedicated Reasoning Models3. From DeepSeek V3 to V3.14. DeepSeek V3.2-Exp and Sparse Attention5. DeepSeekMath V2 with Self-Verification and Self-Refinement6. DeepSeek V3.2 (Dec 1, 2025)7. Conclusion

Sort: