Kimi K2.5 from Moonshot AI introduces three notable research directions: vision-based coding via native multimodal training with a 1:9 vision-to-text ratio and a custom MoonViT3D encoder; agent swarm using Parallel Agent Reinforcement Learning (PARL) that enables an orchestrator to self-spawn and schedule hundreds of sub-agents concurrently, reducing execution time 3-4.5x; and ultra-sparse Mixture-of-Experts architecture with 1 trillion total parameters but only 32B activated per token across 384 experts. The model was continually trained on 15 trillion mixed tokens on top of Kimi K2's 15 trillion token pre-training, matching its pre-training budget—an unusual scale for continued training. Key innovations include zero-vision SFT for teaching visual tool use without high-quality vision data, and a critical-path reward metric to prevent RL gaming in multi-agent setups.
Sort: