DeepEP is a communication library designed for Mixture-of-Experts (MoE) and expert parallelism (EP), providing high-throughput, low-latency all-to-all GPU kernels for efficient data transfer. It supports low-precision operations and includes optimized kernels for asymmetric-domain bandwidth forwarding. The library is tested on various performance metrics and supports traffic isolation and adaptive routing for improved network efficiency.
Table of contents
PerformanceQuick startNetwork configurationsInterfaces and examplesNoticesLicenseCitationSort: