Why Most A/B Tests Are Lying to You

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

Four common statistical errors invalidate most A/B tests: peeking at results early (inflating false positives to 26%), running underpowered tests that exaggerate effect sizes (winner's curse), testing multiple metrics without correction (up to 64% false alarm rate), and confusing statistical significance with practical significance. Switching to Bayesian testing doesn't fix peeking — simulations show false positive rates can hit 80% with fixed posterior thresholds. The solution is a 5-point pre-test checklist: calculate sample size, fix runtime, declare one primary metric, set a practical significance threshold, and choose a testing method (frequentist, Bayesian, or sequential) before launching. A worked e-commerce checkout example demonstrates the protocol in practice, contrasting correct results (+0.6pp lift) against a peeking-driven false winner (+1.1pp on day 3).

13m read timeFrom towardsdatascience.com
Post cover image
Table of contents
The Peeking Problem: 26% of Your Winners Aren’t RealThe Power Vacuum: Small Samples, Inflated EffectsThe Multiple Comparisons TrapWhen “Significant” Doesn’t Mean SignificantThe Bayesian Fix That Doesn’t Fix AnythingThe Pre-Test ProtocolWorked Example: Checkout Flow TestWhat Rigorous Testing Actually Buys YouReferences

Sort: