Testing RDMA over Thunderbolt 5 on a four-Mac Studio cluster with 1.5 TB unified memory shows significant performance gains for running massive AI models. The M3 Ultra Mac Studio outperforms comparable systems from Nvidia and AMD in CPU, AI inference, and power efficiency benchmarks. RDMA support in Exo 1.0 enables linear performance scaling across nodes, achieving 30+ tokens/second on trillion-parameter models. However, limitations include Thunderbolt 5's four-node maximum, macOS cluster management challenges, stability issues with prerelease software, and lack of standard networking options like QSFP for larger deployments.
Table of contents
VideoA Mini Mac RackM3 Ultra Mac Studio - BaselineMini Stack, Maxi MacHPL and Llama.cppEnabling RDMAStability IssuesUnanswered Questions / Topics to Explore FurtherConclusion2 Comments
Sort: