Alibaba Releases Qwen1.5-MoE-A2.7B: A Small MoE Model with only 2.7B Activated Parameters yet Matching the Performance of State-of-the-Art 7B models like Mistral 7B

We are a community of AI/ ML/Generative AI enthusiasts/researchers/journalists/writers who share interesting news and articles about the applications of AI. 

Machine Learning News

Qwen1.5-MoE-A2.7B is an improved version of Qwen, a Large Language Model (LLM) series developed by the Qwen team at Alibaba Cloud. It performs on par with heavyweight 7B models like Mistral 7B and Qwen1.5-7B, despite having only 2.7 billion activated parameters. The architecture of Qwen1.5-MoE-A2.7B utilizes fine-grained experts and a generalized MoE routing paradigm.