Pinterest's ML team shares a detailed investigation into why new L1 conversion (CVR) models showed strong offline improvements (20–45% LogMAE reduction) but neutral or negative online A/B results. The root causes were identified as: (1) feature coverage gaps where high-impact features existed in training logs but were never

11m read time From medium.com
Post cover image
Table of contents
IntroductionBackground: Two Ways to Judge an L1 ModelHow We Structured the InvestigationWhat We Ruled Out Quickly1. Offline evaluation issues2. Exposure bias and traffic share3. Timeouts and serving failuresGet Pinterest Engineering’s stories in your inboxSummaryWhat Actually Broke: Features and Embeddings1. Feature O/O discrepancy: training vs. serving2. Embedding version skew: query vs. PinBeyond Prediction: Funnel and Metric Effects1. Funnel alignment2. Metric mismatchConclusion: O/O as a Design ConstraintAcknowledgments

Sort: