Perplexity developed software optimizations enabling trillion-parameter mixture of experts (MoE) models to run efficiently on older, cheaper GPU hardware using AWS Elastic Fabric Adapter (EFA). The new kernels address memory and network latency challenges by optimizing communication between GPUs across multiple nodes, achieving
•6m read time• From go.theregister.com
Sort: