UltraMem, introduced by ByteDance, is a novel AI architecture designed to improve computational efficiency and reduce inference latency for large language models (LLMs). Built on the foundation of Product Key Memory (PKM), UltraMem uses ultra-sparse memory layers and a Pre-LayerNorm Transformer architecture to outperform MoE and PKM models. It achieves up to six times faster inference speeds while maintaining superior scaling capabilities and resource efficiency.
Sort: