In this video we go back to the original important paper from Google that introduced Vision Transformers (ViT). Up until vision transformers, CNNs were dominating the computer vision domain. Since the invention of transformers with the Attention Is All You Need paper, various attempts were made to utilize transformers in computer vision.  We explain the challenge with doing so and how ViT architecture is able to deal with that challenge.
We also review the reduction of inductive bias in vision transformers comparing to convolutional neural networks.

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale - https://arxiv.org/abs/2010.11929

Blog post - https://aipapersacademy.com/vision-transformers/
-----------------------------------------------------------------------------------------------
✉️ Join the newsletter - https://aipapersacademy.com/newsletter/

👍 Please like & subscribe if you enjoy this content

Become a patron - https://www.patreon.com/aipapersacademy

We use VideoScribe to edit our videos - https://tidd.ly/44TZEiX
-----------------------------------------------------------------------------------------------

Chapters:
0:00 Introduction
0:55 Using Transformers as-is?
2:13 How ViT Works?
3:30 Inductive Bias

AI Papers Academy

Vision Transformers (ViT) solve the problem of applying transformers to images by breaking images into fixed-size patches instead of feeding individual pixels. Feeding raw pixels creates an impractically large quadratic attention matrix (e.g., 65K×65K for a 256×256 image). Patches are flattened, projected via a trainable linear layer, combined with positional embeddings, and fed as a sequence to a standard transformer. A special learnable class token aggregates global image information for classification. Compared to CNNs, ViT has lower inductive bias — self-attention allows every patch to attend to every other patch at every layer, and even positional embeddings are learned from scratch rather than hand-designed.

Vision Transformers Explained | The ViT Paper