Kimi Linear introduces a hybrid linear attention architecture featuring Kimi Delta Attention (KDA), a refined version of Gated DeltaNet with improved gating mechanisms. The 48B parameter model (3B activated) supports 1M token context length, reduces KV cache requirements by 75%, and achieves 6x faster decoding throughput

3m read timeFrom github.com
Post cover image
Table of contents
OverviewKey FeaturesUsageCitation
1 Comment

Sort: