IBM released Granite 4.0, a family of open-source small language models optimized for speed and low cost. The models use a hybrid architecture combining Mamba-2's linear scaling with Transformer precision, plus MoE routing that activates only needed parameters. This design enables running 30B parameter models on consumer GPUs
•3m read time• From replicate.com
1 Comment
Sort: