Comprehensive lecture notes covering NLP fundamentals through Transformer architecture. Explains tokenization strategies (word, subword, character-level), Word2vec embeddings using proxy tasks, RNN/LSTM limitations including vanishing gradients, and the attention mechanism that enables direct token connections. Details the

Sort: