Best of GPU — June 2025

1
Article
Hacker News·51w
Building an AI Server on a Budget ($1.3K)
A comprehensive guide to building a custom AI server for $1,300, covering hardware selection (RTX 4070 GPU, motherboard, CPU, RAM), assembly process, Ubuntu Server installation, and software setup including NVIDIA drivers and CUDA toolkit. The build prioritizes cost-effectiveness for AI workloads while maintaining upgrade flexibility for future expansion.
108
17
2
Article
Hacker News·48w
sirius-db/sirius
Sirius is a GPU-native SQL engine that integrates with existing databases like DuckDB through the Substrait query format. It delivers approximately 10x performance improvements over CPU-based query engines on TPC-H benchmarks while maintaining the same hardware costs. The system supports NVIDIA GPUs with compute capability 7.0+ and CUDA 11.2+, offering deployment options through AWS AMIs, Docker images, or manual installation. Sirius handles common SQL operations including filtering, joins, aggregations, and ordering, though it currently has limitations around data size constraints, row count limits, and partial NULL column support.
24
1
3
Article
Chrome Developers·48w
What's New in WebGPU (Chrome 138)
Chrome 138 introduces several WebGPU improvements including simplified buffer binding syntax, stricter size validation for mapped buffers, updated GPU architecture reporting for Nvidia Blackwell and AMD RDNA4, deprecation of GPUAdapter's isFallbackAdapter attribute, and enhanced Dawn framework support with Emscripten integration for cross-platform development.
11
4
Video
YouTube·49w
This Laptop Runs LLMs Better Than Most Desktops
The Asus Flow Z13 2025 with AMD's Ryzen AI Max Plus 395 APU can run 110 billion parameter LLMs thanks to its 128GB of unified memory, outperforming many desktop setups. The APU combines CPU and GPU on a single chip, allowing the GPU to access large amounts of shared memory. However, unlike Apple's true unified memory architecture, AMD's implementation requires pre-allocating memory between CPU and GPU at boot time. Performance testing shows that manual memory allocation settings significantly outperform auto settings, with 16GB GPU allocation often providing optimal results. The system's 235 GB/s memory bandwidth enables competitive performance against Apple Silicon, though the memory copying process during model loading reveals architectural limitations compared to true unified memory systems.
11

See all GPU archives