NVIDIA CUDA 13.3 Enhances GPU Development with Tile Programming in C++, Compiler Autotuning, and Python Updates

NVIDIA CUDA 13.3 introduces several major updates: CUDA Tile programming in C++ for high-level, portable GPU kernel development on Hopper and other architectures; CUDA Python 1.0 with stable semantic versioning, green contexts, process checkpointing, and IPC support; and CompileIQ, a new compiler auto-tuning framework using evolutionary/genetic algorithms that delivers up to 15% speedup on GEMM and attention kernels. The release also adds C++23 support in NVCC/NVRTC, CCCL 3.3 with DLPack/mdspan tensor interoperability, new parallel algorithms (FindIf, segmented scan, binary search), a comprehensive random number distribution library, and a new Numba CUDA MLIR backend with ~1.4x faster JIT compile times. Math libraries (cuBLAS, cuSPARSE, cuSOLVER) receive performance improvements for Blackwell and Hopper architectures.

#python

#nvidia

#gpu

#c++

#cuda

Yesterday•13m read time•From developer.nvidia.com

Table of contents

Release of CUDA Tile C++Release of CUDA Python 1.0 Try CUDA Python today CompileIQ launched Math libraries CCCL Compilers/NVCC More CUDA 13.3 enhancements Get started

Comment

Bookmark

Copy

Sort: