PrismML has released Ternary Bonsai, a family of 1.58-bit language models available in 8B, 4B, and 1.7B parameter sizes. Using ternary weights {-1, 0, +1} throughout the entire network — including embeddings, attention, MLPs, and LM head — the models achieve roughly 9x smaller memory footprint than standard 16-bit models. The 8B variant fits in 1.75 GB and scores 75.5 on average benchmarks, outperforming all comparable-size models except Qwen3 8B (which is ~9x larger). On Apple M4 Pro hardware, the 8B model runs at 82 tokens/sec with ~5x better energy efficiency than 16-bit counterparts. Models are available under Apache 2.0 and run natively on Apple devices via MLX. This builds on PrismML's earlier 1-bit Bonsai family, offering a new tradeoff point between memory and performance.
Table of contents
A true ternary modelBenchmark performanceExtending the Pareto frontierThroughput and energy usePlatform CoverageJoin UsSort: