HuggingFace built an agent skill that teaches AI coding agents (Claude, Codex) to write production-ready CUDA kernels with PyTorch bindings. The skill packages domain expertise about GPU architectures, memory patterns, and library integration into ~550 tokens of structured guidance. Testing on LTX-Video (diffusers) and Qwen3-8B
Table of contents
Why a skill for kernels?Installing the skillWhat is in the skillBenchmarking the kernels: Diffusers (LTX-Video on H100)Benchmarking the kernels: Transformers (Qwen3-8B on H100)Publishing your kernel to the HubConclusionResources2 Comments
Sort: