Causal inference for LLM-based features starts with one question editors ask before they ship anything: Did the change actually move the metric, or did the metric just move? Let's say that your team b

freeCodeCamp is a nonprofit organization offering free online coding courses and programming tutorials, covering topics such as web development, data science, and machine learning. Learners can gain practical coding skills, build real-world projects, and earn certifications to advance their careers in tech.

freeCodeCamp

A practical guide to applying Regression Discontinuity Design (RDD) to measure the causal effect of LLM confidence-threshold routing on task completion. Using a 50,000-user synthetic dataset with a known +6 percentage point ground truth, the tutorial walks through sharp RDD with local linear regression, bandwidth sensitivity sweeps, McCrary density manipulation checks, quadratic robustness checks, and bootstrap confidence intervals — all implemented in Python with statsmodels. The post also covers when RDD fails (manipulation, co-firing policies, fuzzy thresholds, curvature bias) and points to the rdrobust package for production use.

Product Experimentation with Regression Discontinuity: How an LLM Confidence Threshold Creates a Natural Experiment in Python

Why Threshold Routing is a Natural Experiment

What Regression Discontinuity Actually Does