VectorWare demonstrates running Rust's std::thread on the GPU for the first time, mapping each std::thread to a GPU warp. The approach starts with only Warp 0 active (running main), with additional warps woken on thread::spawn() and blocked on thread::join(). This preserves Rust's borrow checker semantics, prevents warp divergence by construction, and unlocks large portions of the Rust ecosystem (rayon, tokio, etc.) for GPU use. The post covers the implementation details, benefits (no divergence, familiar Rust abstractions), and downsides (finite warps, expensive synchronization, memory constraints). The approach targets NVIDIA GPUs but is portable to Vulkan subgroups and HIP/ROCm wavefronts.

11m read timeFrom vectorware.com
Post cover image
Table of contents
Execution modelsFunctions as programsWhy support std::thread on the GPU?Why not map std::thread to GPU threads?A world first: std::thread on the GPUImplementationBenefitsDownsidesIs VectorWare only focused on Rust?Follow along
1 Comment

Sort: