Researchers from Princeton and Meta AI introduce Lory, a fully-differentiable MoE model designed for autoregressive language model pre-training. Lory outperforms dense models in language modeling and downstream tasks, and utilizes casual segment routing and similarity-based data batching techniques.

4m read timeFrom marktechpost.com
Post cover image

Sort: