Meet Mamba-3: the SSM built for inference. Faster than Transformers at decode, stronger than Mamba-2, and open-source from day one.

Hacker News is a community-driven platform for sharing and discussing technology news, startups, and programming-related topics. Through user submissions and comments, Hacker News offers insights into emerging technology trends, industry developments, and entrepreneurial ventures. Readers can participate in discussions, share their insights, and stay informed about the latest advancements in technology and innovation.

Hacker News

Mamba-3 is a new state space model (SSM) designed with inference efficiency as the primary goal, contrasting with Mamba-2's training-speed focus. Key improvements include a more expressive recurrence via exponential-trapezoidal discretization, complex-valued state tracking, and a MIMO (multi-input, multi-output) variant that boosts accuracy without increasing decode latency. At the 1.5B scale, Mamba-3 SISO outperforms Mamba-2, Gated DeltaNet, and Llama-3.2-1B on prefill+decode latency across all sequence lengths. The architecture also removes the short causal convolution from Mamba-1/2, adds QKNorm for training stability, and incorporates RoPE for complex-valued SSM representation. Kernels are open-sourced using a mix of Triton, TileLang, and CuTe DSL for maximum hardware performance on Hopper GPUs.