Meta AI's Byte Latent Transformer (BLT) is a tokenizer-free LLM architecture that processes raw bytes instead of tokens. Rather than fixed tokenization, BLT dynamically groups bytes into patches using entropy-based methods — a small byte-level model estimates prediction uncertainty to determine patch boundaries, allocating more compute to harder-to-predict sequences. The architecture has three components: a Local Encoder that groups bytes into patches via cross-attention, a large Latent Transformer that processes patches, and a Local Decoder that converts patch outputs back to byte sequences. Benchmarks show BLT outperforms LLaMA 2/3 models on bits-per-byte metrics given the same training budget, with larger patch sizes (6–8 bytes) proving especially efficient at scale.

10m watch time

Sort: