Learn about BPE, the tokenization algorithm used by large language models like GPT, LLaMA, and RoBERTa. Walk through a worked example and explore Python implementations.
Table of contents
Training a BPE tokenizerPython implementation, from Sennrich et al.Streaming implementation, using a trieFurther commentsSort: