TheRegister's platform is a leading technology news website, offering insights into IT industry news, hardware reviews, and software updates. Through articles, analysis, and opinion pieces, TheRegister offers insights into cybersecurity threats, technology trends, and industry developments. Readers can stay updated with the latest news and analysis from the world of technology and IT business.

The Register

Nvidia revealed at GTC that it will integrate Groq's language processing units (LPUs) into new LPX rack systems alongside its Vera Rubin GPU racks to dramatically accelerate AI inference. The architecture splits LLM inference into two stages: Rubin GPUs handle compute-heavy prefill, while 256 Groq 3 LPUs per rack handle the bandwidth-heavy decode phase, achieving token generation in the thousands per second per user. This enables pricing as high as $45 per million tokens. Each Groq 3 LPU offers 150 TB/s memory bandwidth but only 500 MB of on-chip SRAM, requiring multiple LPX racks ganged together for trillion-parameter models. The move effectively abandons Nvidia's earlier Rubin CPX prefill processor concept. AWS is pursuing a similar hybrid approach, pairing Trainium 3 with Cerebras WSE-3 ASICs. CUDA support for LPUs is not yet native.

Nvidia slaps Groq into new LPX racks for faster AI response