A deep dive into Trinity Large, covering architecture, sparsity, training at scale, and why we shipped Preview, Base, and TrueBase checkpoints.

Hacker News is a community-driven platform for sharing and discussing technology news, startups, and programming-related topics. Through user submissions and comments, Hacker News offers insights into emerging technology trends, industry developments, and entrepreneurial ventures. Readers can participate in discussions, share their insights, and stay informed about the latest advancements in technology and innovation.

Hacker News

Arcee AI released Trinity Large, a 400B parameter sparse Mixture-of-Experts model with 13B active parameters per token using 256 experts (4 active per token). The model was trained on 17T tokens using 2048 Nvidia B300 GPUs in 33 days for $20M total cost. Three variants are available: Preview (lightly post-trained, chat-ready), Base (full 17T checkpoint), and TrueBase (10T checkpoint without instruct data). The architecture uses momentum-based expert load balancing and z-loss for stability, achieving 2-3x faster inference than peers. Trinity Large supports 512k context natively and demonstrates frontier-level performance across math, coding, and reasoning benchmarks.

Trinity Large: An Open 400B Sparse MoE Model