Best of Hugging Face — September 2025

1
Article
Hugging Face·32w
Jupyter Agents: training LLMs to reason with notebooks
Hugging Face developed Jupyter Agent, a system that trains small language models to perform data science tasks by executing code in Jupyter notebooks. They created a comprehensive pipeline starting with 2TB of Kaggle notebooks, applied deduplication and quality filtering, generated synthetic question-answer pairs, and fine-tuned Qwen3-4B models. The approach achieved 75% accuracy on easy DABStep benchmark tasks, demonstrating that smaller models can become effective data science agents with proper training data and scaffolding. The project includes open-source datasets, trained models, and a simplified 200-line scaffolding system.
28
2
Article
Hugging Face·30w
Gaia2 and ARE: Empowering the community to study agents
Hugging Face introduces Gaia2, an advanced AI agent benchmark that goes beyond read-only tasks to evaluate interactive behaviors in real-world conditions. Unlike its predecessor GAIA, Gaia2 tests agents on complex scenarios including ambiguity handling, time-sensitive actions, and noise tolerance using a smartphone mock-up environment. The release includes the open-source Agent Research Environments (ARE) framework for running, debugging, and evaluating agents with structured trace recording. Current results show GPT-5 as the top performer, while temporal reasoning remains challenging for all models. The platform enables researchers to create custom scenarios and connect their own tools via MCP integration.
24

See all Hugging Face archives