A CppCon 2025 talk exploring whether standard C++ parallel algorithms (introduced in C++17) can replace CUDA for GPU-accelerated computing. The speaker walks through GPU fundamentals, the evolution of memory management models (pinned memory, managed memory, HMM), and how the par_unseq execution policy enables running standard algorithms on GPUs with minimal code changes. Key topics include supported algorithms, hardware/OS requirements (Nvidia and AMD), performance pitfalls like divergent branching, atomic operation overhead, and alternating host/device memory access, plus practical limitations such as no exceptions, no dynamic polymorphism, and random-access-iterator requirements. The conclusion is that writing GPU-accelerated C++ has never been easier, but understanding hardware fundamentals is still necessary to avoid performance traps.
Sort: