GreenBoost is a new open-source GPLv2 Linux kernel module that extends NVIDIA GPU VRAM by transparently pooling system RAM and NVMe storage. It works alongside NVIDIA's official drivers via a kernel module that allocates pinned DDR4 pages exported as DMA-BUF descriptors, and a CUDA shim library injected via LD_PRELOAD that intercepts cudaMalloc calls to redirect large allocations (like LLM KV caches and model weights) to the extended memory pool. This allows running larger LLMs—such as a 31.8GB model on a 12GB RTX 5070—without CPU offloading penalties or quality-reducing quantization. The project also handles Ollama's internal dlopen/dlsym usage to correctly report expanded memory capacity.

3m read timeFrom phoronix.com
Post cover image

Sort: