Huawei has developed the Pangu Ultra MoE, a 718 billion-parameter sparse language model, optimized for Ascend NPUs. This model leverages a simulation-driven architecture to address challenges in efficiently training large Mixture of Experts (MoE) models. Key innovations include dynamic expert placement, adaptive system strategies to improve computation balance, and fine-grained memory optimizations. The Pangu Ultra MoE achieves competitive performance in various benchmark evaluations, marking a significant advancement in system-aware AI design.
Sort: