DeepSeek's manifold-constrained Hyper-Connections (mHC) addresses a critical instability in transformer architectures. While standard residual connections use a single information stream, Hyper-Connections expand to multiple parallel streams with learnable mixing matrices. However, unconstrained mixing matrices can amplify
•7m read time• From taylorkolasinski.com
Table of contents
The SetupThe ExplosionThe Fix: Constrain the ManifoldThe ResultsWhy This MattersTakeawaysWhat’s NextResourcesSort: