NVIDIA didn't want me to do this

A hands-on experiment building a cluster of four (then eight) NVIDIA DJX Spark machines to run large language models using tensor parallelism and RDMA over Converged Ethernet (RoCE). The video covers the hardware challenges of QSFP cable types (28 vs 56), managed switch configuration, SSH mesh setup, and benchmarking with

Sort: