A practical guide to applying Regression Discontinuity Design (RDD) to measure the causal effect of LLM confidence-threshold routing on task completion. Using a 50,000-user synthetic dataset with a known +6 percentage point ground truth, the tutorial walks through sharp RDD with local linear regression, bandwidth sensitivity sweeps, McCrary density manipulation checks, quadratic robustness checks, and bootstrap confidence intervals — all implemented in Python with statsmodels. The post also covers when RDD fails (manipulation, co-firing policies, fuzzy thresholds, curvature bias) and points to the rdrobust package for production use.

20m read timeFrom freecodecamp.org
Post cover image
Table of contents
Table of ContentsWhy Threshold Routing is a Natural ExperimentWhat Regression Discontinuity Actually DoesPrerequisitesSetting Up the Working ExampleWhen Regression Discontinuity FailsWhat to Do Next

Sort: