A practical guide to applying Regression Discontinuity Design (RDD) to measure the causal effect of LLM confidence-threshold routing on task completion. Using a 50,000-user synthetic dataset with a known +6 percentage point ground truth, the tutorial walks through sharp RDD with local linear regression, bandwidth sensitivity sweeps, McCrary density manipulation checks, quadratic robustness checks, and bootstrap confidence intervals — all implemented in Python with statsmodels. The post also covers when RDD fails (manipulation, co-firing policies, fuzzy thresholds, curvature bias) and points to the rdrobust package for production use.
Table of contents
Table of ContentsWhy Threshold Routing is a Natural ExperimentWhat Regression Discontinuity Actually DoesPrerequisitesSetting Up the Working ExampleWhen Regression Discontinuity FailsWhat to Do NextSort: