Best of GPU — September 2024

1
Article
Community Picks·2y
AI. Finally, a Reason for My Homelab
Running a home lab can be highly beneficial for those experimenting with AI. With detailed steps and hardware recommendations, this setup allows for local AI processing, cost-effective experimentation, and maintaining privacy. Ben Arent discusses his specific setup including using an AMD Ryzen 5 5600X CPU, 128GB RAM, and an NVIDIA RTX 4000 Ada GPU, all while leveraging Docker and various software tools. Cost analysis and future plans for the homelab server highlight the practical aspects of this approach.
29
2
2
Article
Hacker News·2y
cupy/cupy: NumPy & SciPy for GPU
CuPy is a Python library compatible with NumPy/SciPy, designed for GPU-accelerated computing. It supports NVIDIA CUDA and AMD ROCm platforms, offering functionality such as low-level CUDA features and direct CUDA Runtime API calls. Installation packages are available via PyPI and Conda-Forge for various architectures. CuPy also allows containerized execution with NVIDIA Container Toolkit.
21
2
3
Article
ML & AI·2y
Llama3 70B on 4GB GPU, Llama3.1 405B on 8GB GPU with AirLLM lib.
The AirLLM library enables running large language models (LLMs) like Llama3 70B and Llama3.1 405B on GPUs with minimal memory requirements. It supports multiple models and offers 4-bit/8-bit compression to significantly speed up inference. Techniques such as layer-wise model decomposition and block-wise quantization reduce memory usage and disk loading bottlenecks, although there may be increased latency due to reliance on slower disk I/O.
20
4
Article
Hacker News·2y
srush/GPU-Puzzles: Solve puzzles. Learn CUDA.
GPU architectures are increasingly important in machine learning. This interactive notebook helps beginners learn GPU programming using NUMBA to map Python code to CUDA kernels. The post features exercises that teach you to build GPU kernels, and it's suggested to use Google Colab for these tasks. It steps through multiple coding examples and common pitfalls to help develop a solid understanding of GPU programming techniques.
19
5
Article
It's Foss·2y
Monitor GPU Usage on Ubuntu and Other Linux Systems
Monitoring GPU usage on Ubuntu and other Linux distributions requires specific tools as default system utilities do not display GPU stats. For GUI preferences, Mission Center supports NVIDIA, AMD, and Intel GPUs, with installation available via Flatpak or AUR for Arch Linux. For command-line options, nvidia-smi is suitable for NVIDIA GPUs, while nvtop and gpustat are versatile for multiple GPU brands. Keeping track of GPU utilization and temperature can diagnose performance issues and ensure proper resource use.
19
6
Article
Daily Dose of Data Science | Avi Chawla | Substack·2y
A Subtle Trick to Optimize Neural Network Training
Discover a subtle optimization trick for neural network training that involves normalizing data after transferring it to the GPU. This simple rearrangement can significantly reduce data transfer time, especially in tasks like image classification where pixel values are initially 8-bit integers. While the technique may not apply to all use cases, such as NLP, it can offer noticeable performance gains in applicable scenarios.
19
7
Article
PyTorch·2y
CUDA-Free Inference for LLMs
The post discusses achieving FP16 inference with popular LLM models like Meta’s Llama3-8B and IBM’s Granite-8B Code using 100% Triton Language, comparing its performance to CUDA-dominant workflows on Nvidia GPUs. Using Triton offers cross-GPU compatibility, higher abstraction, and faster kernel development. The post covers Triton-based kernel implementations, benchmarks showing up to 82% of CUDA performance, and future optimizations for better GPU utilization.
13
1
8
Article
Jendrik Illner·2y
Graphics Programming weekly - Issue 359 - September 29th, 2024
An issue showcasing various facets of graphics programming, including tools like RenderDoc and Nsight for game content analysis, insights into Vulkan Device Generated Commands, and a rundown of sessions by AMD at the Graphics Programming Conference. Highlights include an introduction to OpenUSD concepts through NVIDIA’s interactive course and a detailed paper on guiding direct light sampling in scenes with numerous lights.
10
1
9
Video
TechLinked·2y
The RTX 5090 Cometh
Nvidia's upcoming RTX 5090 and 580 graphics cards feature impressive specs, including up to 32 GB of VRAM for the 5090. Intel's new Core 200 series laptops outperform Qualcomm's Snapdragon X Series, offering better battery life and full x86 app support. OpenAI is undergoing restructuring to shift control from its nonprofit board to favor a for-profit model, causing several leadership changes. Microsoft is revising its controversial Recall feature, and Meta is utilizing AI-generated content based on users' interests. Researchers discovered a significant security vulnerability in Kia's website, allowing remote access to vehicle functionality.
10

See all GPU archives