A deep dive into the hardware infrastructure enabling multi-GPU communication for AI training workloads. Covers PCIe (Gen4–Gen6 bandwidth specs), NVLink (intra-node GPU-to-GPU direct communication with up to 1.8 TB/s on Blackwell), NVSwitch (non-blocking all-to-all GPU communication scaling to 256 GPUs), and InfiniBand

5m read time From towardsdatascience.com
Post cover image
Table of contents
IntroductionThe Communication StackKey Design PrinciplesConclusion

Sort: