Meta AI's CoCoMix (Continuous Concept Mixing) is a novel LLM pre-training framework that augments standard next-token prediction with continuous concepts derived from a sparse autoencoder. During training, a pre-trained sparse autoencoder extracts meaningful semantic features from hidden states, and attribution scoring selects the most impactful concepts as training labels. The model learns to predict these concepts and mixes them back into the hidden state sequence before passing to subsequent transformer layers. Benchmarks on a 1.38B parameter model trained on 200B tokens show CoCoMix achieves equivalent perplexity with 21.5% fewer training tokens and higher accuracy on downstream tasks. The framework also enables interpretability and steerability by amplifying specific concept predictions to influence model outputs.
Sort: