A conference talk by Ben Burtenshaw from Hugging Face exploring how coding agents can tackle AI systems engineering tasks. Three progressively complex use cases are presented: (1) using agents to write and distribute optimized CUDA kernels via the Hugging Face kernels library, achieving a 94% speedup on Qwen 3 8B for H100; (2) zero-shot LLM fine-tuning where an agent trains a model on the Hugging Face Hub; and (3) a multi-agent AutoLab setup inspired by Karpathy's Auto Research project, where specialized agents (researcher, planner, workers, reporter) autonomously run ML experiments, track results via Trackio, and iterate on training scripts. Key takeaways: agents work best with open primitives and well-exposed APIs rather than abstracted ones, and the Hugging Face Hub now provides the storage, compute, and tracking infrastructure needed for agentic ML workloads.

18m watch time

Sort: