Researchers from Princeton and Meta AI introduce Lory, a fully-differentiable MoE model designed for autoregressive language model pre-training. Lory outperforms dense models in language modeling and downstream tasks, and utilizes casual segment routing and similarity-based data batching techniques.
Sort: