NVIDIA released CUDA Tile IR, an open-source MLIR-based compiler infrastructure for optimizing CUDA kernels through tile-based computation patterns targeting tensor cores. The project includes a domain-specific MLIR dialect, Python bindings, bytecode serialization, and comprehensive testing. It requires CMake 3.20+, C++17, and

9m read timeFrom github.com
Post cover image
Table of contents
Core ComponentsCUDA Tile SpecificationBuilding CUDA TileTestingIntegrating CUDA Tile Into Your ProjectExample: Writing and Running a CUDA Tile IR ProgramContributions and SupportLicense

Sort: