TorchInductor now supports CuteDSL (NVGEMM) as a fourth autotuning backend for matrix multiplications, alongside Triton, CUTLASS C++, and cuBLAS. CuteDSL is a Python-based DSL built on the same abstractions as CUTLASS C++ but compiles via a custom Python-to-MLIR compiler, achieving compile times comparable to Triton while
Table of contents
IntroductionStrategy: Why We Target GEMMsBackground: How TorchInductor Generates GEMMsArchitecture of the CuteDSL BackendResultsCuteDSL Backend Supported FeaturesHow You Can Try ItFuture WorkConclusionSort: