Best of Hugging FaceJanuary 2026

  1. 1
    Article
    Avatar of huggingfaceHugging Face·12w

    Architectural Choices in China's Open-Source AI Ecosystem: Building Beyond DeepSeek

    China's open-source AI ecosystem has shifted toward Mixture-of-Experts (MoE) architectures as the default choice, prioritizing cost-performance balance over maximum capability. Leading organizations expanded beyond text models into multimodal domains (video, audio, 3D), with growing emphasis on small models (0.5B-30B parameters) for practical deployment. Apache 2.0 became the standard license, reducing friction for production use. A significant strategic shift emerged toward hardware-first development, with models increasingly optimized for domestic Chinese chips (Huawei Ascend, Cambricon, Baidu Kunlun) in both inference and training. Companies are open-sourcing production-grade serving systems and infrastructure, moving competition from isolated model performance to full-stack ecosystem design.

  2. 2
    Article
    Avatar of huggingfaceHugging Face·13w

    Differential Transformer V2

    Differential Transformer V2 introduces a redesigned attention mechanism that doubles query heads while maintaining key-value heads, eliminating the need for custom kernels and achieving faster decoding speeds. The architecture removes per-head RMSNorm to improve training stability, introduces token-level and head-level lambda projections to overcome softmax constraints, and eliminates attention sinks. Production-scale experiments on trillion-token datasets show 0.02-0.03 lower language modeling loss, reduced gradient spikes under large learning rates, and decreased activation outliers compared to standard Transformers, while saving approximately 25% of attention module parameters.