DrP is Meta's automated root cause analysis platform that programmatically investigates incidents in large-scale systems. It provides an SDK for creating investigation playbooks (analyzers), a scalable backend for execution, and integrations with alerting and incident management tools. Used by over 300 teams at Meta, DrP runs 50,000 analyses daily and has reduced mean time to resolve (MTTR) by 20-80%. The platform includes ML algorithms for anomaly detection, time series correlation, and dimension analysis, with automated post-processing for mitigation actions. Meta plans to evolve DrP into an AI-native platform as part of their broader AI4Ops vision.
Sort: