Hugging Face developed Jupyter Agent, a system that trains small language models to perform data science tasks by executing code in Jupyter notebooks. They created a comprehensive pipeline starting with 2TB of Kaggle notebooks, applied deduplication and quality filtering, generated synthetic question-answer pairs, and fine-tuned Qwen3-4B models. The approach achieved 75% accuracy on easy DABStep benchmark tasks, demonstrating that smaller models can become effective data science agents with proper training data and scaffolding. The project includes open-source datasets, trained models, and a simplified 200-line scaffolding system.

โ€ข15m read timeโ€ขFrom huggingface.co
Post cover image
Table of contents
๐Ÿ Primer: the DABStep Benchmark๐ŸŽฏ First Baseline๐Ÿ”ง Primer on Scaffolding๐Ÿƒโ€โ™‚๏ธ Training Pipelineโš™๏ธ Dataset Pipeline๐Ÿƒโ€โ™‚๏ธ Training Pipeline๐Ÿ“Š ResultsTry Jupyter Agent Yourself๐Ÿ”ฎ Next Steps

Sort: