A comprehensive visual guide covering the major attention variants used in modern open-weight LLMs. Starting from the fundamentals of Multi-Head Attention (MHA), it progresses through Grouped-Query Attention (GQA), Multi-Head Latent Attention (MLA), Sliding Window Attention (SWA), DeepSeek Sparse Attention (DSA), Gated
Table of contents
1. Multi-Head Attention (MHA)2. Grouped-Query Attention (GQA)3. Multi-Head Latent Attention (MLA)4. Sliding Window Attention (SWA)5. DeepSeek Sparse Attention (DSA)6. Gated Attention7. Hybrid AttentionConclusionSort: