Airbnb's observability team identified that high alert noise wasn't a culture problem but a tooling gap in their Observability as Code workflow. Without a way to preview alert behavior before deployment, engineers resorted to weeks-long side-by-side deployments to validate changes. They rebuilt their OaC platform with three key capabilities: text-based diffs in CLI and PRs, a Change Report UI showing side-by-side alert diffs, and bulk backtesting that simulates alerts against historical data using Prometheus's rule engine. The result: development cycles collapsed from weeks to minutes, companywide alert noise dropped 90%, and they successfully migrated 300,000 alerts from a vendor to Prometheus. Key architectural lessons include prioritizing Prometheus compatibility over novelty, enforcing guardrails to prevent backtesting from destabilizing production, and owning the full developer surface area to avoid leaky abstractions.
Table of contents
How we changed our Observability as Code alert review process and cut development cycles from weeks to minutes.Airbnb’s OaC North StarThe problem: Traditional code review can’t validate alert behaviorThe solution: Making alert behavior visibleSort: