Mount Mayhem at Netflix: Scaling Containers on Modern CPUs
Netflix engineers diagnosed a severe container launch bottleneck when migrating from a virtual kubelet+Docker runtime to kubelet+containerd with per-container user namespaces. The new runtime uses kernel idmap mounts, generating O(n) mount operations per container layer, all competing for global VFS mount locks. On r5.metal instances (dual-socket, multi-NUMA), this caused 30-second health check timeouts and system lockups. Deep profiling with perf and Intel TMA revealed 95.5% of pipeline slots stalled on contested accesses, with NUMA remote memory latency and hyperthreading amplifying the contention. Benchmarks across instance types showed AMD's distributed chiplet cache architecture (m7a) scaled far better than Intel's centralized mesh (m7i), and disabling hyperthreading improved latency 20-30%. The software fix, contributed upstream to containerd, maps the common parent directory of all layers instead of each layer individually, reducing mount operations from O(n) to O(1) per container and eliminating the global lock as a bottleneck entirely.