Sarvam AI is open-sourcing two reasoning models: Sarvam 30B and Sarvam 105B, both trained entirely in India under the IndiaAI mission. Both use a Mixture-of-Experts Transformer architecture with sparse expert routing. The 30B model uses Grouped Query Attention for efficient real-time deployment, while the 105B uses Multi-head Latent Attention for long-context inference. Training covered 16T tokens (30B) and 12T tokens (105B) across code, web, math, and multilingual data including 10 major Indian languages. A custom RL pipeline using an asynchronous GRPO architecture with adaptive sampling was developed in-house. On benchmarks, Sarvam 105B is competitive with frontier models like DeepSeek R1 and o4-mini, while both models achieve state-of-the-art results on Indian language benchmarks. Inference is heavily optimized, achieving 3–6x throughput improvements over Qwen3 baselines on H100s. Weights are available on Hugging Face and AI Kosh under Apache 2.0.
Table of contents
ArchitectureTrainingBenchmarksInference OptimizationDemos🏓 మీ దగ్గరలో (బెంజ్ సర్కిల్) కోర్టులు🛍️ కొనుగోలు చేయాల్సిన వస్తువులు (ఖర్చు వివరాలు)🎯 బిగినర్స్ కోసం సలహాConclusionAcknowledgementsSort: