Kimi Linear introduces a hybrid linear attention architecture featuring Kimi Delta Attention (KDA), a refined version of Gated DeltaNet with improved gating mechanisms. The 48B parameter model (3B activated) supports 1M token context length, reduces KV cache requirements by 75%, and achieves 6x faster decoding throughput
1 Comment
Sort: