TorchInductor now supports CuteDSL (NVGEMM) as a fourth autotuning backend for matrix multiplications, alongside Triton, CUTLASS C++, and cuBLAS. CuteDSL is a Python-based DSL built on the same abstractions as CUTLASS C++ but compiles via a custom Python-to-MLIR compiler, achieving compile times comparable to Triton while

14m read timeFrom pytorch.org
Post cover image
Table of contents
IntroductionStrategy: Why We Target GEMMsBackground: How TorchInductor Generates GEMMsArchitecture of the CuteDSL BackendResultsCuteDSL Backend Supported FeaturesHow You Can Try ItFuture WorkConclusion

Sort: