DeepSeek-V4 is a new frontier open model designed specifically for long-running agentic workloads. It introduces a hybrid attention architecture combining Compressed Sparse Attention (CSA, 4x compression) and Heavily Compressed Attention (HCA, 128x compression), reducing KV cache memory to roughly 2% of standard grouped query attention. V4-Pro requires only 27% of the single-token inference FLOPs of V3.2, and V4-Flash drops to 10%. Key agent-specific improvements include preserved reasoning traces across tool-call boundaries and user turns, a new XML-based tool-call schema with dedicated tokens to reduce parsing failures, and a Rust-based sandbox infrastructure (DSec) used for RL training against real tool environments. On agent benchmarks, V4-Pro-Max reaches 80.6 on SWE Verified, 73.6 on MCPAtlas, and 67.9 on Terminal Bench 2.0, placing it at parity with frontier closed models. Four model checkpoints (Pro and Flash, instruct and base) are available on Hugging Face Hub.

7m read timeFrom huggingface.co
Post cover image
Table of contents
The KV cache problem for agentsHybrid attention: CSA and HCAWhat changes for agentsAgent benchmark resultsUsing the models
2 Comments

Sort: