Best of GPU — 2025

1
Article
Noted·1y
Self-Host DeepSeek with Ollama and Open WebUI
DeepSeek R1 is an advanced AI model developed by the Chinese company DeepSeek, released in 2023. It is designed to run efficiently on lower-grade hardware thanks to distilled versions available on Ollama. The post provides instructions for installing DeepSeek R1 using Open WebUI and highlights the model's capabilities in natural language understanding and problem-solving. By sharing its technology openly, DeepSeek aims to drive innovation in AGI systems.
193
6
2
Article
Modal·37w
Introducing Notebooks
Modal launches Notebooks, a collaborative cloud-based computing environment that provides instant GPU-enabled Python kernels starting in under 5 seconds. The platform offers real-time collaborative editing, automatic resource scaling from 0.125 CPUs to 8 H100/B200 GPUs, and seamless integration with Modal's existing infrastructure including Volumes and Functions. Key features include fast cold-start times, automatic idle shutdown to reduce costs, shared environments across teams, and modern development tools like LSP support and AI completions.
183
3
Article
Hacker News·51w
Building an AI Server on a Budget ($1.3K)
A comprehensive guide to building a custom AI server for $1,300, covering hardware selection (RTX 4070 GPU, motherboard, CPU, RAM), assembly process, Ubuntu Server installation, and software setup including NVIDIA drivers and CUDA toolkit. The build prioritizes cost-effectiveness for AI workloads while maintaining upgrade flexibility for future expansion.
108
17
4
Article
Hacker News·47w
NVIDIA is full of shit
NVIDIA's RTX 50 series launch has been plagued by multiple issues including scalper bots, melting power connectors, defective chips missing processing units, and unstable drivers. The company continues using the problematic 12VHPWR connector despite known design flaws that can cause cables to melt under certain conditions. NVIDIA's marketing heavily relies on DLSS upscaling technology to achieve advertised performance numbers, with even flagship cards unable to run ray-traced games at native 4K resolution. The company has also been accused of pressuring tech reviewers to include specific metrics in their coverage and threatening to withdraw access for unfavorable reviews. With over 90% market share, NVIDIA's dominance has led to vendor lock-in through proprietary technologies while charging premium prices for incremental performance improvements.
84
15
5
Article
DEV·1y
How to Install DeepSeek-R1 32B on Windows: System Requirements, Docker, Ollama, and WebUI Setup
This guide provides detailed instructions for installing DeepSeek-R1 32B on Windows using three different methods: Docker, Ollama, and WebUI. It includes the system requirements for both minimum and recommended setups, steps for installing each method, and considerations for choosing the best installation method based on user needs and hardware capabilities.
75
6
Video
LaurieWired·1y
Pixar DOESN'T use GPUs...
Pixar primarily uses CPUs instead of GPUs for rendering due to their flexibility and ability to handle large scenes with over 150 million polygons. Their render farms leverage AVX 512 and SSE 4.2 extensions for optimization. While GPUs can offer significant speed gains, they are limited by their memory capacity, making CPUs a more viable option for Pixar's needs.
68
1
7
Article
InfoWorld·29w
Perplexity’s open-source tool to run trillion-parameter models without costly upgrades
Perplexity AI released TransferEngine, an open-source tool that enables trillion-parameter language models to run across different cloud providers' GPU hardware at full speed. The software solves vendor lock-in by creating a universal interface for GPU-to-GPU communication that works on both Nvidia ConnectX and AWS EFA networking protocols. This allows companies to run massive models like DeepSeek V3 and Kimi K2 on older H100 and H200 systems instead of purchasing expensive next-generation hardware. TransferEngine achieves 400 Gbps throughput using RDMA technology and is already powering Perplexity's production AI search engine, handling disaggregated inference, reinforcement learning, and Mixture-of-Experts routing.
59
8
Article
Noted·28w
The 10MB Discord Limit Drove Me to Build a Self-Hosted GPU Video Compressor
A developer built 8mb.local, a self-hosted video compression tool that solves Discord's 10MB file size limit. The single Docker container includes a SvelteKit UI, FastAPI backend, and Celery worker queue with automatic GPU detection for NVIDIA, Intel, and AMD hardware acceleration. It features target-size-first compression with automatic retry logic, real-time progress streaming via Server-Sent Events, and seamless CPU fallback. Installation requires choosing the appropriate docker-compose configuration for your hardware, with special attention to NVIDIA driver capabilities and reverse proxy buffering settings for proper SSE streaming.
56
5
9
Article
Medium·36w
Don’t buy GPUs for AI
GPUs are becoming unnecessary for most AI applications as smaller language models like Mistral 7B and Phi-3 Mini deliver practical results on CPUs. Modern processors, edge devices with NPUs, and cloud rental options provide cost-effective alternatives to expensive GPU ownership. Specialized hardware like TPUs and software optimizations through quantization are making GPUs obsolete for all but the largest model training operations.
53
9
10
Article
Product Hunt·42w
SelfHostLLM: Calculate the GPU memory you need for LLM inference
SelfHostLLM is a tool that helps developers calculate GPU memory requirements and maximum concurrent requests for self-hosted large language model inference. It supports popular models like Llama, Qwen, DeepSeek, and Mistral, allowing users to plan their AI infrastructure efficiently with custom configurations.
49
1
11
Article
HelixML·30w
Technical Deep Dive on Streaming AI Agent Desktop Sandboxes: When Gaming Protocols Meet Multi-User Access
Helix adapted Moonlight, a gaming streaming protocol designed for single-player sessions, to stream GPU-accelerated desktop environments for AI agents to multiple users simultaneously. The team initially used "apps mode" with a workaround where their API pretended to be a client to start containers, but are migrating to "lobbies mode" which natively supports multi-user access to shared sessions. The solution enables low-latency (50-100ms) streaming of full Linux desktops with AI agents working in real IDEs and browsers, though challenges remain with input scaling and video corruption across different client resolutions.
46
1
12
Article
NVIDIA Developer·28w
Release v1.10.0 · NVIDIA/warp
NVIDIA Warp v1.10.0 introduces experimental JAX automatic differentiation support and multi-device compatibility with jax.pmap(). The release enhances tile programming with axis-specific reductions and component-level indexing, while delivering significant performance improvements including up to 70× faster built-in function calls from Python and in-place BVH rebuilding with CUDA graph support. New features include negative array indexing, atomic bitwise operations, and error functions. The warp.sim module has been removed after deprecation, with users directed to migrate to the Newton physics engine.
44
13
Article
Noted·1y
Generate Stunning Background Images using Self Hosted Fooocus in Docker
Learn how to generate stunning background images using the self-hosted Fooocus tool in Docker. The guide provides step-by-step instructions on setting up Fooocus with Docker Compose, downloading necessary models, and optimizing GPU performance for better image generation. It also covers customizing settings for different styles and resolutions, making it a powerful tool for unleashing creativity.
38
14
Article
ITNEXT·52w
AI: Introduction to Ollama for local LLM launch
Ollama provides an easy way to run large language models locally on your own hardware. The guide covers installation on Linux, setting up GPU acceleration with NVIDIA cards, basic commands for model management, and integration with Python applications. It demonstrates running DeepSeek-R1 models, monitoring performance metrics, adjusting context windows, and creating custom models using Modelfiles with system prompts. Local deployment offers cost savings, privacy benefits, and the ability to experiment with models not available through public APIs.
32
15
Article
Hacker News·34w
newton-physics/newton: An open-source, GPU-accelerated physics simulation engine built upon NVIDIA Warp, specifically targeting roboticists and simulation researchers.
Newton is a GPU-accelerated physics simulation engine built on NVIDIA Warp, designed for robotics and simulation research. The project extends Warp's deprecated sim module and integrates MuJoCo Warp as its primary backend. Key features include GPU-based computation, OpenUSD support, differentiability, and extensibility. Currently in active beta under the Linux Foundation with Apache 2.0 licensing, Newton was initiated by Disney Research, Google DeepMind, and NVIDIA. The engine includes extensive examples covering basic physics, robot simulations, cloth dynamics, inverse kinematics, material point method (MPM), and differentiable simulation scenarios.
30
16
Article
Where's Your Ed At·27w
The Hater's Guide To NVIDIA
NVIDIA dominates the AI hardware market by selling increasingly expensive GPUs (from $10,000 A100s to $30,000+ B200s) that power large language models. The company's success depends on customers—primarily Microsoft, Google, Meta, and Amazon—continuously purchasing new GPU generations, often funded through massive debt. Building a small 25MW AI data center costs over $1 billion, with $600 million for GPUs alone, plus 20 acres of land and 6-18 months of construction. Despite NVIDIA's $50+ billion quarterly revenue and 8% weight in the S&P 500, the underlying economics appear unsustainable: AI companies generate only ~$61 billion in revenue annually while spending hundreds of billions on infrastructure, with no clear path to profitability.
27
5
17
Article
Hacker News·48w
sirius-db/sirius
Sirius is a GPU-native SQL engine that integrates with existing databases like DuckDB through the Substrait query format. It delivers approximately 10x performance improvements over CPU-based query engines on TPC-H benchmarks while maintaining the same hardware costs. The system supports NVIDIA GPUs with compute capability 7.0+ and CUDA 11.2+, offering deployment options through AWS AMIs, Docker images, or manual installation. Sirius handles common SQL operations including filtering, joins, aggregations, and ordering, though it currently has limitations around data size constraints, row count limits, and partial NULL column support.
24
1
18
Article
Phoronix·32w
Valve Developer Contributes Open-Source Driver Fixes For 12 Year Old Hawaii GPUs
Valve's open-source Linux graphics team has contributed driver fixes for AMD's 12-year-old Hawaii GPU architecture. This continues their work on improving Linux GPU driver support for legacy hardware that original vendors no longer maintain, alongside recent achievements like enabling NVIDIA DLSS on the open-source NVK driver.
22
1
19
Article
Community Picks·1y
GPU Glossary
A comprehensive glossary detailing various aspects of GPUs, including types, performance metrics, and their applications in computing and gaming.
22
2
20
Article
Cloudflare·26w
Why Replicate is joining Cloudflare
Replicate, a platform for running machine learning models as APIs, has been acquired by Cloudflare. Founded in 2019 to make research models accessible to developers through tools like Cog, Replicate became a key infrastructure provider during the Stable Diffusion era. The acquisition enables integration with Cloudflare's network infrastructure, Workers, R2, and other services to build a comprehensive AI stack. The combined platform aims to support edge model execution, instant-booting Workers for model pipelines, and WebRTC streaming for model inputs and outputs.
21
21
Article
The Register·41w
Sam Altman admits that AI is a bubble, but still a big thing
OpenAI CEO Sam Altman acknowledged that AI is currently in a bubble phase, comparing it to the dot-com era where overexcitement led to inflated valuations despite underlying technological importance. He believes AI will survive the eventual burst, similar to how the internet persisted after the dot-com crash. Despite recognizing the bubble, Altman plans massive expansion, stating OpenAI will spend trillions on datacenter construction. The company faces GPU shortages that influenced ChatGPT-5's design focus on cost optimization rather than power. OpenAI's revenue reached $10 billion annually but the company still operates at a loss, raising questions about funding sources for ambitious expansion plans.
21
1
22
Article
Hacker News·1y
Rust-GPU/Rust-CUDA: Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
The Rust CUDA Project aims to make Rust a tier-1 language for high-performance GPU computing using the CUDA toolkit. This effort includes developing tools to compile Rust to PTX code and create libraries that facilitate the use of existing CUDA libraries in Rust. The project addresses historical issues with Rust's compatibility with CUDA and seeks to advance Rust's role in GPU computing. Contributions are welcome as the project is actively being developed and rebooted.
20
23
Article
Hacker News·1y
containers/ramalama: The goal of RamaLama is to make working with AI boring.
RamaLama simplifies local management and serving of AI Models using OCI containers. It supports CPU and GPU configurations, uses Podman or Docker, and pulls AI models from registries like HuggingFace, Ollama, and Docker Hub. Users can manage models with simple commands, view container statuses, and easily switch model transports. available on PyPi, the tool ensures easy installation and usage.
20
24
Article
NVIDIA Developer·36w
Release v1.9.0 · NVIDIA/warp
Warp 1.9.0 introduces a fully differentiable marching cubes implementation written entirely in Warp, CUDA 13 toolkit compatibility, and new ahead-of-time compilation functions. Performance improvements include graph-capturable linear solvers and automatic tiling for sparse linear algebra and finite element quadrature. Programming model enhancements add better indexing for composite types, direct IntEnum support, local array initialization in kernels, and indexed tile operations for flexible memory access patterns.
16
25
Article
TechCentral·29w
China’s DeepSeek warns of social upheaval from AI
DeepSeek's senior researcher Chen Deli made a rare public appearance at China's World Internet Conference, expressing concerns about AI's long-term societal impact. While optimistic about the technology itself, Chen warned that AI could threaten widespread job displacement within 5-10 years and create massive social challenges in 10-20 years. DeepSeek gained global attention in January for releasing a low-cost AI model that outperformed leading US models. The company recently upgraded its V3 model in September and has become central to China's efforts to build a domestic AI ecosystem, with Chinese chip makers like Cambricon and Huawei developing hardware compatible with DeepSeek's models.
15

See all GPU archives