Best of GPUJune 2025

  1. 1
    Article
    Avatar of hnHacker News·51w

    Building an AI Server on a Budget ($1.3K)

    A comprehensive guide to building a custom AI server for $1,300, covering hardware selection (RTX 4070 GPU, motherboard, CPU, RAM), assembly process, Ubuntu Server installation, and software setup including NVIDIA drivers and CUDA toolkit. The build prioritizes cost-effectiveness for AI workloads while maintaining upgrade flexibility for future expansion.

  2. 2
    Article
    Avatar of hnHacker News·48w

    sirius-db/sirius

    Sirius is a GPU-native SQL engine that integrates with existing databases like DuckDB through the Substrait query format. It delivers approximately 10x performance improvements over CPU-based query engines on TPC-H benchmarks while maintaining the same hardware costs. The system supports NVIDIA GPUs with compute capability 7.0+ and CUDA 11.2+, offering deployment options through AWS AMIs, Docker images, or manual installation. Sirius handles common SQL operations including filtering, joins, aggregations, and ordering, though it currently has limitations around data size constraints, row count limits, and partial NULL column support.

  3. 3
    Article
    Avatar of chromeChrome Developers·48w

    What's New in WebGPU (Chrome 138)

    Chrome 138 introduces several WebGPU improvements including simplified buffer binding syntax, stricter size validation for mapped buffers, updated GPU architecture reporting for Nvidia Blackwell and AMD RDNA4, deprecation of GPUAdapter's isFallbackAdapter attribute, and enhanced Dawn framework support with Emscripten integration for cross-platform development.

  4. 4
    Video
    Avatar of youtubeYouTube·49w

    This Laptop Runs LLMs Better Than Most Desktops

    The Asus Flow Z13 2025 with AMD's Ryzen AI Max Plus 395 APU can run 110 billion parameter LLMs thanks to its 128GB of unified memory, outperforming many desktop setups. The APU combines CPU and GPU on a single chip, allowing the GPU to access large amounts of shared memory. However, unlike Apple's true unified memory architecture, AMD's implementation requires pre-allocating memory between CPU and GPU at boot time. Performance testing shows that manual memory allocation settings significantly outperform auto settings, with 16GB GPU allocation often providing optimal results. The system's 235 GB/s memory bandwidth enables competitive performance against Apple Silicon, though the memory copying process during model loading reveals architectural limitations compared to true unified memory systems.