PyTorch 2.11 has been released with 2723 commits from 432 contributors. Key highlights include differentiable collectives for distributed training (enabling backpropagation through collective operations), FlexAttention with a FlashAttention-4 backend on Hopper and Blackwell GPUs (1.2×–3.2× speedups over Triton), expanded MPS operator coverage for Apple Silicon, RNN/LSTM GPU export support via torch.export, and XPUGraph for Intel GPU execution optimization. Additional updates include FP16 GEMM support via OpenBLAS on CPU, ROCm device-side assertions and TopK optimizations, CUDA 13 as the new default, TorchScript deprecation in favor of torch.export, and an increased release cadence to every two months in 2026.
Table of contents
API-UNSTABLE FeaturesSort: