Datadog engineers share six hard-won lessons from five years of running eBPF-powered workload protection at scale. The post covers: navigating kernel version and distribution compatibility pitfalls (verifier changes, hook point availability, function inlining); safely capturing kernel data (CO-RE offsets, TOCTOU races,

40m read timeFrom datadoghq.com
Post cover image
Table of contents
Why we chose eBPF for Workload ProtectionWhat we evaluated before choosing eBPFWhy eBPF stood outSix lessons from running eBPF in production1. Navigate the edge cases of eBPF program loading and kernel hook points2. Safely capturing and enriching kernel data reliably is harder than it looks3. eBPF introduces an attack surface that should be monitored and audited4. Kernel resources are shared—account for other eBPF-based tools5. Measuring performance impact is a necessary evil and a two-step process6. Best practices before rolling out to production—and acknowledging the risksWhat’s next ?Closing thoughts

Sort: