Best of GPUNovember 2024

  1. 1
    Article
    Avatar of jeffgeerlingJeff Geerling·2y

    LLMs accelerated with eGPU on a Raspberry Pi 5

    A stable patch for the `amdgpu` Linux kernel driver now allows AMD RX series GPUs (400 to 7000) to work with Raspberry Pi 5, supporting both Vulkan graphics and compute API. This guide provides steps to install `llama.cpp` with Vulkan support on the Pi 5. While larger models face performance issues due to inefficient memory access translations by the `amdgpu` driver, smaller models perform well. The compact, power-efficient setup only uses about 10-12W when idle, making it an attractive option for local machine learning tasks.

  2. 2
    Article
    Avatar of do_communityDigitalOcean Community·2y

    Real-Time Audio Translation with OpenAI APIs on DigitalOcean GPU Droplets Using Open WebUI

    This tutorial guides you through deploying a real-time audio translation application using OpenAI APIs on Open WebUI, hosted on DigitalOcean GPU Droplets powered by NVIDIA H100 GPUs. Steps include creating a project and GPU Droplet, setting up Docker with GPU support, and configuring Open WebUI. The final setup allows for the use of OpenAI models like GPT-4o and Whisper for real-time audio translation and transcription.

  3. 3
    Article
    Avatar of jeffgeerlingJeff Geerling·2y

    AMD Radeon PRO W7700 running on Raspberry Pi

    The Pi community has successfully enabled AMD Radeon GPUs, including 6000 and 7000-series, to work on the Raspberry Pi 5. Some modern AAA games can run at low FPS on the Pi's hardware, while older games like Portal 2 run smoothly. This guide details how to set up and patch the Raspberry Pi OS for AMD GPU support, including firmware installation and kernel recompilation. It also addresses hardware transcoding support and provides a list of hardware used for the setup.

  4. 4
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    Extending the Context Length of LLMs

    The post explains techniques to extend the context length of large language models (LLMs), highlighting methods like sparse attention and flash attention. These techniques help manage the computational complexity associated with processing longer context windows, making it feasible to handle extensive tokens without a drastic increase in cost. The importance of optimizing positional embeddings, particularly rotary positional embeddings (RoPE), is also discussed to maintain the relative position and relation of tokens.