Transformers is arguably the most influential neural network architecture in the last decade, powering the current boom of generative AI. 

In this video, we will review the basic ideas of the original encoder-decoder transformer architecture and understand how various design decisions are made. 

Enjoy!

Slides download: https://www.dropbox.com/scl/fi/x7zkydvekohh0ej22si9u/Lec_03_Transformer_overview.pptx?rlkey=l16nj8bvl97y5dq39z7qfeaii&dl=0

Jia-Bin Huang

A comprehensive walkthrough of how Transformer neural networks work, covering tokenization, token embeddings, the attention mechanism (including queries, keys, and values), multi-head attention, positional encoding, residual connections, layer normalization, and the encoder-decoder architecture. Also compares encoder-only (BERT), decoder-only (GPT, LLaMA), and encoder-decoder (T5, BART) model families and their respective use cases.

But What Are Transformers?