NVIDIA has announced Multipath Reliable Connection (MRC), a new RDMA transport protocol developed in collaboration with Microsoft, OpenAI, AMD, Broadcom, and Intel, now released as an open specification through the Open Compute Project. MRC enables a single RDMA connection to distribute traffic across multiple network paths, improving throughput, load balancing, and resilience for large-scale AI training clusters. Deployed on NVIDIA Spectrum-X Ethernet hardware, MRC provides hardware-speed failure detection and rerouting in microseconds, sustains high GPU utilization under congestion, and supports multiplanar network designs scaling to hundreds of thousands of GPUs. Major deployments include OpenAI's Blackwell generation clusters, Microsoft's Fairwater data center, and Oracle Cloud Infrastructure's Abilene facility.

4m read timeFrom blogs.nvidia.com
Post cover image

Sort: