Bye tokens, hello patches

Meta introduces the BLT architecture, a new approach to scaling language models by working directly with raw bytes instead of breaking text into tokens. This dynamic method improves performance, especially in character-level tasks, and offers efficiency in resource usage. The BLT model excels in handling unpredictable text and scaling while maintaining the same compute budget.