Werner Vogels details a decade-long effort by AWS Lambda's networking team to eliminate VPC cold start latency. The team used eBPF to rewrite Geneve tunnel headers on the fly, reducing tunnel setup from 150ms to 200μs. They then tackled SnapStart scaling challenges by pre-creating all 4,000 network namespaces at worker boot (constant work pattern), replacing stateful iptables NAT with stateless eBPF packet mangling (100x NAT setup improvement), reducing root namespace iptables rules from 125,000+ to 144, and batching RTNL lock operations to eliminate queuing. The result: a unified network topology supporting both traditional and snapshot-based workloads at 20x higher density, with 1% CPU savings at Lambda scale. The networking stack was later extracted as a service adopted by Aurora DSQL.
Table of contents
What is a network topology?The VPC cold start problemReimagining our network topology (out of necessity)One bottleneck at a timeInvisible engineeringThis is the job1 Comment
Sort: