PyTorch 2.12 has been released with 2,926 commits from 457 contributors. Key highlights include up to 100x faster batched eigendecomposition on CUDA via updated cuSolver backend selection, a new device-agnostic torch.accelerator.Graph API unifying graph capture across CUDA, XPU, and out-of-tree backends, and torch.export now supporting Microscaling (MX) quantization formats for deploying compressed models. The Adagrad optimizer gains fused=True support, and torch.cond control flow can now be captured inside CUDA Graphs using CUDA 12.4 conditional IF nodes. ROCm users gain expandable memory segments, rocSHMEM symmetric memory collectives, and FlexAttention pipelining with 5-26% speedups. Apple MPS gets ahead-of-time Metal-4 shader compilation. TorchScript deprecation continues, and the CUDA 12.8 wheel is deprecated in favor of CUDA 13.0+.

11m read timeFrom pytorch.org
Post cover image
Table of contents
Platform Related UpdatesDeprecations and Breaking Changes

Sort: