Yelp developed a back-testing engine to safely simulate changes to their ad budget allocation system before deploying them to production. The system replays historical campaign data day-by-day using production code, ML models (CatBoost) to predict outcomes like clicks and leads, and Bayesian optimization (Scikit-Optimize) to explore parameter spaces. This approach allows faster iteration, reduces A/B testing risks, catches bugs early, and provides more accurate impact predictions than traditional methods. The engine integrates eight components including Git submodules for production code, Redshift for historical data, and MLFlow for experiment tracking.
Table of contents
IntroductionWhat is a Back-Testing Engine?How our Ad Budget Allocation system worksSystem overviewInsights & LearningsConclusionSort: