TLDR: We show promising results of up to a 1.46x speedup with <2% drop in accuracy on float32 Vision Transformers on A100 GPUs by applying block sparsity on MLP module’s weights. This approach can potentially be applied to other types of transformers including large language models. Our implementation and benchmarks to reproduce our results are available at https://github.com/pytorch-labs/superblock.

PyTorch offers insights into deep learning, neural network modeling, and machine learning research, providing documentation, tutorials, and best practices for building and training models with PyTorch framework. By exploring PyTorch's curated content, developers can learn about tensor computations, autograd mechanisms, and model deployment strategies for solving complex problems in computer vision, natural language processing, and reinforcement learning. Whether you're a researcher, practitioner, or enthusiast, PyTorch offers resources to advance your understanding of deep learning and push the boundaries of AI innovation.

PyTorch

The post discusses the application of block sparsity on MLP modules in vision transformers, showing promising speedups with minimal accuracy drop. It explains the training and inference steps, provides microbenchmarking results, and showcases the speedup and accuracy achieved on a specific ViT model. Future steps and potential optimizations for block sparsity on vision transformers are also mentioned.

Speeding up ViTs using Block Sparsity