Best of Hugging FaceSeptember 2025

  1. 1
    Article
    Avatar of huggingfaceHugging Face·32w

    Jupyter Agents: training LLMs to reason with notebooks

    Hugging Face developed Jupyter Agent, a system that trains small language models to perform data science tasks by executing code in Jupyter notebooks. They created a comprehensive pipeline starting with 2TB of Kaggle notebooks, applied deduplication and quality filtering, generated synthetic question-answer pairs, and fine-tuned Qwen3-4B models. The approach achieved 75% accuracy on easy DABStep benchmark tasks, demonstrating that smaller models can become effective data science agents with proper training data and scaffolding. The project includes open-source datasets, trained models, and a simplified 200-line scaffolding system.

  2. 2
    Article
    Avatar of huggingfaceHugging Face·30w

    Gaia2 and ARE: Empowering the community to study agents

    Hugging Face introduces Gaia2, an advanced AI agent benchmark that goes beyond read-only tasks to evaluate interactive behaviors in real-world conditions. Unlike its predecessor GAIA, Gaia2 tests agents on complex scenarios including ambiguity handling, time-sensitive actions, and noise tolerance using a smartphone mock-up environment. The release includes the open-source Agent Research Environments (ARE) framework for running, debugging, and evaluating agents with structured trace recording. Current results show GPT-5 as the top performer, while temporal reasoning remains challenging for all models. The platform enables researchers to create custom scenarios and connect their own tools via MCP integration.