mixture-of-experts
Researchers from Princeton and Meta AI Introduce ‘Lory’: A Fully-Differentiable MoE Model Designed for Autoregressive Language Model Pre-TrainingThis AI Paper by DeepSeek-AI Introduces DeepSeek-V2: Harnessing Mixture-of-Experts for Enhanced AI PerformanceMixture of Expert Architecture. Definitions and Applications included Google’s Gemini and Mixtral 8x7BBringing MegaBlocks to DatabricksHow do mixture-of-experts layers affect transformer models?Alibaba Releases Qwen1.5-MoE-A2.7B: A Small MoE Model with only 2.7B Activated Parameters yet Matching the Performance of State-of-the-Art 7B models like Mistral 7BCreate Mixtures of Experts with MergeKitUnderstanding the Sparse Mixture of Experts (SMoE) Layer in MixtralMixtral-8x7B: Overview and Benchmarks with Combining Mixtral and Flash Attention 2grok-1
All posts about mixture-of-experts