Best of TransformersNovember 2024

  1. 1
    Video
    Avatar of 3blue1brown3Blue1Brown·2y

    Large Language Models explained briefly

    The post explains large language models (LLMs), how they function, and the complexities behind their training. LLMs predict the next word in a sequence based on probabilities, using vast amounts of text data for training. The introduction of transformers in 2017 allowed for parallel processing of text, enhancing computation efficiency. Pre-training is supplemented by reinforcement learning with human feedback to refine model predictions. The sheer scale of data and computation involved is formidable, taking advantage of specialized hardware like GPUs.

  2. 2
    Article
    Avatar of gcgitconnected·2y

    Let’s Build our own GPT Model from Scratch with PyTorch

    Learn how to build a basic Generative Pre-trained Transformer (GPT) model from scratch using PyTorch. This tutorial covers auto-regressive models, character-level tokenization, data batching, and training using text in the style of William Shakespeare. It provides a detailed implementation of a bi-gram language model including the use of multi-head attention, forward and training operations, and generating new text tokens.

  3. 3
    Article
    Avatar of aimodelsfyiAIModels.fyi·2y

    Get ready to lose to Transformers on Lichess

    An innovative study trains large transformer models to play chess by generalizing strategies rather than memorizing moves, using a dataset called ChessBench with 10 million human games. These transformers achieved near-grandmaster level without search-based tactics, showing potential to revolutionize AI in strategic planning tasks.

  4. 4
    Article
    Avatar of taiTowards AI·2y

    Transformers For Images!!

    This post explores the application of transformers in image processing within the field of computer vision, detailing three main methods: Pixel Transformers, Vision Transformers (ViT) by Google Brain, and Swin Transformers by Microsoft. It highlights the limitations of CNNs and offers solutions to computational inefficiencies, such as using image patches and techniques like window attention and hierarchical patches.

  5. 5
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    Extending the Context Length of LLMs

    The post explains techniques to extend the context length of large language models (LLMs), highlighting methods like sparse attention and flash attention. These techniques help manage the computational complexity associated with processing longer context windows, making it feasible to handle extensive tokens without a drastic increase in cost. The importance of optimizing positional embeddings, particularly rotary positional embeddings (RoPE), is also discussed to maintain the relative position and relation of tokens.