Comparison between MoE, Dense, and Hybrid LLM architectures. MoE allows for increased model size and output quality without raising compute costs. Hybrid-MoE combines a residual MoE with a dense transformer for faster training and inference. Snowflake AI Research conducted experiments and developed the Arctic model, which combines a Dense transformer with a MoE transformer for improved performance and lower compute costs.

4m read timeFrom medium.com
Post cover image
Table of contents
MoE vs Dense vs Hybrid LLM ArchitecturesTransformer ArchitecturesDense TransformerMoE TransformerHybrid-MoE TransformerSnowflake’s MoE ExperimentsOptimum MoE ArchitectureSnowflake Arctic: A Hybrid-MoEComparing 600M Dense/ MoE / Hybrid-MoE modelsWandb ReportsConclusion

Sort: