NVIDIA CUDA 13.3 introduces several major updates: CUDA Tile programming in C++ for high-level, portable GPU kernel development on Hopper and other architectures; CUDA Python 1.0 with stable semantic versioning, green contexts, process checkpointing, and IPC support; and CompileIQ, a new compiler auto-tuning framework using evolutionary/genetic algorithms that delivers up to 15% speedup on GEMM and attention kernels. The release also adds C++23 support in NVCC/NVRTC, CCCL 3.3 with DLPack/mdspan tensor interoperability, new parallel algorithms (FindIf, segmented scan, binary search), a comprehensive random number distribution library, and a new Numba CUDA MLIR backend with ~1.4x faster JIT compile times. Math libraries (cuBLAS, cuSPARSE, cuSOLVER) receive performance improvements for Blackwell and Hopper architectures.

13m read timeFrom developer.nvidia.com
Post cover image
Table of contents
Release of CUDA Tile C++Release of CUDA Python 1.0Try CUDA Python todayCompileIQ launchedMath librariesCCCLCompilers/NVCCMore CUDA 13.3 enhancementsGet started

Sort: