Cloudflare's Gen 13 server launch details how they doubled edge compute throughput by switching from AMD EPYC Genoa-X (with large 3D V-Cache) to high-core-count AMD EPYC Turin 9965 processors. The key challenge was that Turin's 192 cores share far less L3 cache per core (2MB vs 12MB), causing severe latency regressions under their legacy NGINX/LuaJIT-based FL1 stack. Hardware tuning alone couldn't solve the problem. The solution was FL2, a complete Rust rewrite of Cloudflare's core request handling layer built on Pingora and Oxy frameworks. FL2's leaner memory access patterns eliminated cache dependency, cutting latency penalties by 70% and enabling linear throughput scaling with core count. The result: 2x throughput vs Gen 12, 50% better performance per watt, and 60% higher rack throughput — all while meeting latency SLAs.

8m read timeFrom blog.cloudflare.com
Post cover image
Table of contents
What AMD EPYCTurin brings to the tableDiagnosing the problem with performance countersThe tradeoff: latency vs. throughputIncremental gains with performance tuningThe opportunity: FL2 was already in progressProving it out: FL2 on Gen 13Generational improvement with Gen 13Gen 13 + FL2: ready for the edge

Sort: