The Illustrated Transformer

The Transformer model uses attention mechanisms to significantly boost the training speed and performance of neural machine translation applications. It features parallelizable structures, consisting of encoding and decoding components with self-attention layers. The high-level view includes word embeddings and feed-forward neural networks for efficient processing. Multi-headed attention further enhances the model's capabilities by allowing it to focus on different parts of the input simultaneously. Positional encodings add information about word order, improving sequence processing. The model's training involves iterative adjustments using backpropagation to refine probability distributions for accurate translations.

#machine-learning

#deep-learning

#nlp

#neural-networks

Jul 02, 2024•21m read time•From jalammar.github.io

Table of contents

A High-Level Look Bringing The Tensors Into The Picture Now We’re Encoding!Self-Attention at a High Level Self-Attention in Detail Matrix Calculation of Self-Attention The Beast With Many Heads Representing The Order of The Sequence Using Positional Encoding The Residuals The Decoder Side The Final Linear and Softmax Layer Recap Of Training The Loss Function Go Forth And Transform Acknowledgements

Comment

Bookmark

Copy

Sort: