Datadog shares operational practices for running Cilium across hundreds of Kubernetes clusters at enterprise scale. Key recommendations include using native routing over overlays, tuning IPAM parameters with surge allocation to prevent pod scheduling delays, implementing strict upgrade gates with preflight validation and
Table of contents
Avoid IPAM pitfalls at scaleUpgrade practices that keep deployments safeMonitor control plane and datapath signals to catch issues earlyConfigure your datapath to be reliable at scaleLessons learned from running Cilium at scaleSort: