Best of GPU — 2024
- 1
- 2
Noted·2y
FileFlows - Self-Hosted Media Conversion Guide
FileFlows is a self-hosted solution to manage and compress media files, saving you substantial disk space. This guide covers setting up FileFlows on Pop-OS with NVIDIA GPU support via Docker Compose, creating custom processing flows, and configuring libraries for media conversion. It highlights the benefits of GPU acceleration for faster file processing, how to add necessary plugins, and offers a detailed example flow for converting MKV to MP4, demonstrating significant disk space savings.
- 3
- 4
Hugging Face·1y
Visualize and understand GPU memory in PyTorch
This tutorial explains how to visualize and understand GPU memory usage in PyTorch during model training. It provides step-by-step instructions on generating and interpreting memory profiles using PyTorch's built-in tools. The tutorial also covers how to estimate and optimize memory requirements for training large models, offering practical tips to manage GPU memory efficiently.
- 5
Community Picks·1y
Universal, Pure-GPU HTML Renderer
Ultralight is a GPU-accelerated toolkit designed to embed modern HTML in games and native applications. It offers ultra-fast rendering either directly on the GPU or on the CPU for ease of integration. Ultralight is highly portable, with support for multiple platforms including Windows, macOS, Linux, PlayStation, Xbox, and ARM64 devices. The toolkit is built in collaboration with leading game studios and provides deep GPU integration, transparent rendering, custom image compositing, and more. It is also optimized for native app developers, offering consistent performance across platforms, automatic window management, and seamless JavaScript-native code integration.
- 6
Hacker News·2y
Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU!
Llama3 70B, the strongest open-source LLM model, can run on a single 4GB GPU using AirLLM. The post provides installation and code instructions for setting up the model. Llama3’s architecture remains the same but benefits from improved training methods and a massive increase in training data quantity and quality. Comparisons with GPT-4 show that Llama3 70B performs closely to GPT-4 and Claude3 Opus. The success of Llama3 highlights the ongoing competition between open-source and closed-source models and stresses the importance of data quality in training AI models.
- 7
AIModels.fyi·2y
🥇Top ML papers of the week
Discover the top trending machine learning papers from Aug 23 to Aug 30, as featured by AIModels.fyi. Highlights include a novel method for zero- and few-shot biomedical named entity recognition using transformers, OBatcher for simplifying batch-parallel data structures in OCaml, a guide to avoiding ML research pitfalls, a new data quality metric called the diversity coefficient, and insights into GPU-to-GPU communication in supercomputers.
- 8
Community Picks·2y
How to Run an LLM Locally with Pieces
The post provides information on running Local Large Language Models (LLLMs) locally within Pieces for Developers. It discusses the demand for secure and efficient machine learning solutions, hardware requirements for running LLMs, the difference between GPU and CPU, the best GPUs for local LLMs, troubleshooting common issues, and future-proofing the setup.
- 9
Threejs Tips and Inspiration·2y
Platform for learning GLSL Shaders
Shader Learning is a platform dedicated to teaching and practicing GPU programming through interactive tasks and theory. It covers topics like fragment and vertex shaders, 2D image manipulation, lighting, shadows, noise functions, texture mapping, and Signed Distance Field functions. The platform also delves into the fundamental mathematical principles behind computer graphics, with support available through a Discord channel.
- 10
Hacker News·2y
How I Self-Hosted Llama 3.2 with Coolify on My Home Server: A Step-by-Step Guide
Inspired by the trend of migrating Next.js applications to self-hosted environments, the author explores self-hosting Llama 3.2 using Coolify on a home server. The main goals include hosting a Next.js website, running Llama 3.2 with GPU acceleration, and setting up a wildcard domain for various services. Key challenges involved configuring the CUDA toolkit for GPU usage and securing the LLM API. The guide provides a detailed walkthrough of the setup process, offering insights into software installations, deployment, and troubleshooting.
- 11
Hacker News·2y
Dynolog: Open source system observability
Dynolog is an open-source system monitoring daemon designed for heterogeneous CPU-GPU systems. It supports always-on performance monitoring and deep-dive profiling modes, integrating with the PyTorch Profiler and Kineto CUDA profiling library. It monitors various hardware and kernel metrics, including CPU, GPU, and network usage, to help optimize AI model training distributed across multiple nodes. Dynolog aims to provide a holistic view of system performance without significant overhead and is actively developed with a focus on Linux platforms and Rust for future components.
- 12
Hacker News·2y
WebGPU Unleashed: A Practical Tutorial
Learn graphics programming in JavaScript using WebGPU through an interactive, web-based tutorial. The book covers an overview of GPU drivers and pipeline, basic and advanced rendering techniques, and GPU computing. The content includes code samples, videos, and a playground for interactive learning.
- 13
Jeff Geerling·2y
Use an External GPU on Raspberry Pi 5 for 4K Gaming
The post covers the process of setting up and using an external GPU with a Raspberry Pi 5 for enhanced 4K gaming performance. It includes detailed instructions on the necessary hardware setup, choosing compatible graphics cards, and patching the Linux kernel to enable full GPU support. Additionally, it discusses the performance benchmarks achieved and potential applications beyond gaming, such as video transcoding.
- 14
YouTube·2y
EASIEST Way to Fine-Tune a LLM and Use It With Ollama
Learn how to fine-tune a large language model (LLM) on your local machine using Ollama and Unsloth. By using the synthetic text-to-SQL dataset and tools like Anaconda, CUDA libraries, and Jupyter notebook, you'll set up an environment to train a small but effective LLM. Benefits include reduced memory usage and easy deployment with Ollama. Ideal for users with or without a GPU, offering practicality using platforms like Google Colab for cloud-based training.
- 15
Community Picks·2y
AI. Finally, a Reason for My Homelab
Running a home lab can be highly beneficial for those experimenting with AI. With detailed steps and hardware recommendations, this setup allows for local AI processing, cost-effective experimentation, and maintaining privacy. Ben Arent discusses his specific setup including using an AMD Ryzen 5 5600X CPU, 128GB RAM, and an NVIDIA RTX 4000 Ada GPU, all while leveraging Docker and various software tools. Cost analysis and future plans for the homelab server highlight the practical aspects of this approach.
- 16
Jeff Geerling·2y
LLMs accelerated with eGPU on a Raspberry Pi 5
A stable patch for the `amdgpu` Linux kernel driver now allows AMD RX series GPUs (400 to 7000) to work with Raspberry Pi 5, supporting both Vulkan graphics and compute API. This guide provides steps to install `llama.cpp` with Vulkan support on the Pi 5. While larger models face performance issues due to inefficient memory access translations by the `amdgpu` driver, smaller models perform well. The compact, power-efficient setup only uses about 10-12W when idle, making it an attractive option for local machine learning tasks.
- 17
ThePrimeTime·1ySo I Tried To Learn Shaders...
The writer shares their journey of learning shaders, starting from a point of confusion back in college to making a renewed effort with modern tools like the Book of Shaders. They explain basic concepts such as what shaders are, how they function in parallel on GPUs, and the importance of uniforms in passing consistent input data to shaders. The post highlights the challenge of understanding shader syntax and execution but encourages continuous learning through practical experimentation.
- 18
NVIDIA Developer·2y
Machine Learning – What Is It and Why Does It Matter?
Many industries use data science and machine learning to recognize patterns, detect changes, and make predictions to enhance their operations. The availability of open-source tools has facilitated this trend since the mid-2000s. Today, improvements in predictive models can result in significant financial gains. However, training these models requires significant computational resources, with GPUs offering a solution to scalability issues that CPUs can no longer handle due to the limitations posed by Moore's law.
- 19
Hacker News·2y
cupy/cupy: NumPy & SciPy for GPU
CuPy is a Python library compatible with NumPy/SciPy, designed for GPU-accelerated computing. It supports NVIDIA CUDA and AMD ROCm platforms, offering functionality such as low-level CUDA features and direct CUDA Runtime API calls. Installation packages are available via PyPI and Conda-Forge for various architectures. CuPy also allows containerized execution with NVIDIA Container Toolkit.
- 20
Medium·2y
Getting started with Flutter GPU
Flutter 3.24 introduces Flutter GPU, a new low-level graphics API for custom rendering in Dart alongside a 3D rendering library called Flutter Scene. Both are in early preview and require Impeller support, with guides provided for setting up projects and drawing graphics. Flutter GPU offers potential for cross-platform rendering solutions in Flutter, while Flutter Scene aims to simplify 3D development in Flutter apps. The post details steps to add these functionalities into Flutter projects and showcases the vast possibilities these new features unlock.
- 21
ML & AI·2y
Llama3 70B on 4GB GPU, Llama3.1 405B on 8GB GPU with AirLLM lib.
The AirLLM library enables running large language models (LLMs) like Llama3 70B and Llama3.1 405B on GPUs with minimal memory requirements. It supports multiple models and offers 4-bit/8-bit compression to significantly speed up inference. Techniques such as layer-wise model decomposition and block-wise quantization reduce memory usage and disk loading bottlenecks, although there may be increased latency due to reliance on slower disk I/O.
- 22
Hacker News·2y
srush/GPU-Puzzles: Solve puzzles. Learn CUDA.
GPU architectures are increasingly important in machine learning. This interactive notebook helps beginners learn GPU programming using NUMBA to map Python code to CUDA kernels. The post features exercises that teach you to build GPU kernels, and it's suggested to use Google Colab for these tasks. It steps through multiple coding examples and common pitfalls to help develop a solid understanding of GPU programming techniques.
- 23
It's Foss·2y
Monitor GPU Usage on Ubuntu and Other Linux Systems
Monitoring GPU usage on Ubuntu and other Linux distributions requires specific tools as default system utilities do not display GPU stats. For GUI preferences, Mission Center supports NVIDIA, AMD, and Intel GPUs, with installation available via Flatpak or AUR for Arch Linux. For command-line options, nvidia-smi is suitable for NVIDIA GPUs, while nvtop and gpustat are versatile for multiple GPU brands. Keeping track of GPU utilization and temperature can diagnose performance issues and ensure proper resource use.
- 24
Daily Dose of Data Science | Avi Chawla | Substack·2y
A Subtle Trick to Optimize Neural Network Training
Discover a subtle optimization trick for neural network training that involves normalizing data after transferring it to the GPU. This simple rearrangement can significantly reduce data transfer time, especially in tasks like image classification where pixel values are initially 8-bit integers. While the technique may not apply to all use cases, such as NLP, it can offer noticeable performance gains in applicable scenarios.
- 25
