Olmo 3 is Allen AI's fully open-source large language model available in 7B and 32B parameter versions. The release includes complete access to models, training datasets (Dolma 3 with 9.3 trillion tokens), code, and training logs. The model uses a three-stage training pipeline: pretraining on Dolma 3 Mix, mid-training on Dolma 3 Dolmino for skill enhancement, and long-context extension on Dolma 3 Longmino. Post-training uses the Dolci suite with SFT, DPO, and RLVR techniques. The 32B model employs grouped query attention while the 7B uses multi-head attention. OlmoTrace enables tracing text back to training sources for auditing and contamination detection.
Table of contents
IntroductionPrerequisitesKey TakeawaysModel ArchitectureData CurationOlmoTraceOlmo3 on DigitalOceanReferences and Additional ResourcesFAQFinal Thoughts1 Comment
Sort: