MIT CSAIL researchers developed CompreSSM, a technique that compresses AI state-space models during training rather than after. Using Hankel singular values from control theory, the method identifies unimportant model dimensions after just 10% of training and removes them, allowing the remaining 90% to run at the speed of a smaller model. On Mamba, it achieves ~4x training speedups while maintaining competitive accuracy. CompreSSM outperforms both post-training pruning and knowledge distillation, being 40x faster than spectral regularization alternatives. The approach is theoretically grounded via Weyl's theorem and targets multi-input, multi-output state-space architectures, with planned extensions toward linear attention and transformer-adjacent models.

6m read timeFrom news.mit.edu
Post cover image

Sort: