DeepEP is a communication library designed for Mixture-of-Experts (MoE) and expert parallelism (EP), providing high-throughput, low-latency all-to-all GPU kernels for efficient data transfer. It supports low-precision operations and includes optimized kernels for asymmetric-domain bandwidth forwarding. The library is tested on

10m read timeFrom github.com
Post cover image
Table of contents
PerformanceQuick startNetwork configurationsInterfaces and examplesNoticesLicenseCitation

Sort: