The post discusses the hardware requirements of GPUs for AI computation and introduces ThunderKittens, an embedded DSL that simplifies writing high-performance AI kernels. It explores the quirks of the NVIDIA H100 GPU and provides sample code for implementing flash attention and linear attention kernels using ThunderKittens.
Table of contents
What's in an H100?ThunderKittensTiles Seem Like a Good IdeaTiles Seem Pretty GeneralSort: