Nvidia Rubin architecture slashes inference costs and GPU needs, transforming AI workloads with extreme co-design for unprecedented efficiency.

IEEE Spectrum's platform is a central hub for technology enthusiasts and professionals, offering insights into  technologies, engineering innovations, and scientific discoveries. Through articles, reports, and interviews, IEEE Spectrum offers insights into emerging technologies, research breakthroughs, and industry trends across various domains. Readers can stay updated with the latest advancements in technology and explore the impact of technology on society and the environment.

IEEE Spectrum

Nvidia's new Vera Rubin architecture, announced at CES 2025, promises 10x lower inference costs and 4x fewer GPUs for training compared to Blackwell. While the Rubin GPU delivers 50 petaflops of 4-bit computation, the real innovation lies in six new chips working together through extreme co-design. The NVLink6 switch doubles bandwidth to 3,600 GB/s for GPU-to-GPU connections, while expanded in-network computing offloads operations from GPUs to switches, reducing latency and power consumption. The scale-out network uses ConnectX-9, BlueField-4, and Spectrum-6 Ethernet switches with co-packaged optics to minimize jitter when connecting racks. Nvidia sees connecting multiple data centers as the next frontier for distributed AI workloads exceeding 100,000 GPUs.

Nvidia Rubin's Network Doubles Bandwidth