unknown

Comprehensive lecture notes covering NLP fundamentals through Transformer architecture. Explains tokenization strategies (word, subword, character-level), Word2vec embeddings using proxy tasks, RNN/LSTM limitations including vanishing gradients, and the attention mechanism that enables direct token connections. Details the complete Transformer architecture with encoder-decoder structure, multi-head self-attention, positional encoding, and cross-attention. Includes practical walkthroughs of training processes, evaluation metrics (BLEU, ROUGE, perplexity), and end-to-end translation examples demonstrating autoregressive generation.

Stanford CME 295 Lecture 1 Notes: Transformers and LLM Foundations

Jace

Stanford's CME 295 is one of the best resources for understanding how LLMs work. I organized the first lecture into developer-friendly notes: NLP basics, tokenization, Word2vec, attention, and Transformer architecture explained.

<p>Stanford's CME 295 is one of the best resources for understanding how LLMs work. I organized the first lecture into developer-friendly notes: NLP basics, tokenization, Word2vec, attention, and Transformer architecture explained.</p>