Hugging Face developed Jupyter Agent, a system that trains small language models to perform data science tasks by executing code in Jupyter notebooks. They created a comprehensive pipeline starting with 2TB of Kaggle notebooks, applied deduplication and quality filtering, generated synthetic question-answer pairs, and
Table of contents
๐ Primer: the DABStep Benchmark๐ฏ First Baseline๐ง Primer on Scaffolding๐โโ๏ธ Training Pipelineโ๏ธ Dataset Pipeline๐โโ๏ธ Training Pipeline๐ ResultsTry Jupyter Agent Yourself๐ฎ Next StepsSort: