Hugging Face developed Jupyter Agent, a system that trains small language models to perform data science tasks by executing code in Jupyter notebooks. They created a comprehensive pipeline starting with 2TB of Kaggle notebooks, applied deduplication and quality filtering, generated synthetic question-answer pairs, and fine-tuned Qwen3-4B models. The approach achieved 75% accuracy on easy DABStep benchmark tasks, demonstrating that smaller models can become effective data science agents with proper training data and scaffolding. The project includes open-source datasets, trained models, and a simplified 200-line scaffolding system.
Table of contents
๐ Primer: the DABStep Benchmark๐ฏ First Baseline๐ง Primer on Scaffolding๐โโ๏ธ Training Pipelineโ๏ธ Dataset Pipeline๐โโ๏ธ Training Pipeline๐ ResultsTry Jupyter Agent Yourself๐ฎ Next StepsSort: