Perplexity AI released TransferEngine, an open-source tool that enables trillion-parameter language models to run across different cloud providers' GPU hardware at full speed. The software solves vendor lock-in by creating a universal interface for GPU-to-GPU communication that works on both Nvidia ConnectX and AWS EFA networking protocols. This allows companies to run massive models like DeepSeek V3 and Kimi K2 on older H100 and H200 systems instead of purchasing expensive next-generation hardware. TransferEngine achieves 400 Gbps throughput using RDMA technology and is already powering Perplexity's production AI search engine, handling disaggregated inference, reinforcement learning, and Mixture-of-Experts routing.

4m read timeFrom infoworld.com
Post cover image

Sort: