Towards Data Science is a community-powered publication that showcases work in data science, machine learning and artificial intelligence. Every day newcomers, seasoned researchers and industry practitioners publish tutorials, research notes and real-world case studies that help the field move forward.

Towards Data Science

A deep dive into the hardware infrastructure enabling multi-GPU communication for AI training workloads. Covers PCIe (Gen4–Gen6 bandwidth specs), NVLink (intra-node GPU-to-GPU direct communication with up to 1.8 TB/s on Blackwell), NVSwitch (non-blocking all-to-all GPU communication scaling to 256 GPUs), and InfiniBand (inter-node communication via RDMA). Also explains key design principles: linear scaling goals, compute-communication overlap, and the significant performance cliff when crossing from intra-node to inter-node communication.

AI in Multiple GPUs: How GPUs Communicate