Allen AI (Ai2) releases EMO, a 1B-active/14B-total-parameter mixture-of-experts (MoE) model that achieves emergent modularity without predefined domain labels. Unlike standard MoEs where experts specialize in surface-level syntactic patterns (prepositions, articles), EMO's experts organize into semantically meaningful domain clusters (health, code, math) by constraining all tokens within a document to route through a shared expert pool during training. This allows users to deploy just 12.5% of the total experts for a specific task while retaining near full-model performance — a 3% absolute drop versus severe degradation in standard MoEs. The full model, baseline, training code, and an interactive visualization tool are all being released openly.
Table of contents
How do we get modularity to emerge?Benchmark resultsWhat are expert subsets specializing to?What we're releasingSort: