DeepEP is a communication library designed for Mixture-of-Experts (MoE) and expert parallelism (EP), providing high-throughput, low-latency all-to-all GPU kernels for efficient data transfer. It supports low-precision operations and includes optimized kernels for asymmetric-domain bandwidth forwarding. The library is tested on various performance metrics and supports traffic isolation and adaptive routing for improved network efficiency.

10m read timeFrom github.com
Post cover image
Table of contents
PerformanceQuick startNetwork configurationsInterfaces and examplesNoticesLicenseCitation

Sort: