Hugging Face developed Jupyter Agent, a system that trains small language models to perform data science tasks by executing code in Jupyter notebooks. They created a comprehensive pipeline starting with 2TB of Kaggle notebooks, applied deduplication and quality filtering, generated synthetic question-answer pairs, and

โ€ข15m read timeโ€ขFrom huggingface.co
Post cover image
Table of contents
๐Ÿ Primer: the DABStep Benchmark๐ŸŽฏ First Baseline๐Ÿ”ง Primer on Scaffolding๐Ÿƒโ€โ™‚๏ธ Training Pipelineโš™๏ธ Dataset Pipeline๐Ÿƒโ€โ™‚๏ธ Training Pipeline๐Ÿ“Š ResultsTry Jupyter Agent Yourself๐Ÿ”ฎ Next Steps

Sort: