NVIDIA CUDA 13.3 introduces several major updates: CUDA Tile programming in C++ for high-level, portable GPU kernel development on Hopper and other architectures; CUDA Python 1.0 with stable semantic versioning, green contexts, process checkpointing, and IPC support; and CompileIQ, a new compiler auto-tuning framework using evolutionary/genetic algorithms that delivers up to 15% speedup on GEMM and attention kernels. The release also adds C++23 support in NVCC/NVRTC, CCCL 3.3 with DLPack/mdspan tensor interoperability, new parallel algorithms (FindIf, segmented scan, binary search), a comprehensive random number distribution library, and a new Numba CUDA MLIR backend with ~1.4x faster JIT compile times. Math libraries (cuBLAS, cuSPARSE, cuSOLVER) receive performance improvements for Blackwell and Hopper architectures.
Table of contents
Release of CUDA Tile C++Release of CUDA Python 1.0Try CUDA Python todayCompileIQ launchedMath librariesCCCLCompilers/NVCCMore CUDA 13.3 enhancementsGet startedSort: