Bridging the Gap: Diagnosing Online–Offline Discrepancy in Pinterest’s L1 Conversion Models

Pinterest's ML team shares a detailed investigation into why new L1 conversion (CVR) models showed strong offline improvements (20–45% LogMAE reduction) but neutral or negative online A/B results. The root causes were identified as: (1) feature coverage gaps where high-impact features existed in training logs but were never onboarded into the L1 embedding serving path, and (2) embedding version skew in two-tower architectures where query and Pin towers ran on different model checkpoints in production. Beyond these, funnel alignment issues (recall ceilings) and metric mismatch between offline loss metrics and online CPA further explained the gap. The team's key takeaway is that online-offline discrepancy should be treated as a design constraint from the start, not a post-hoc debugging problem.

#machine-learning

#pinterest

#recommendation-systems

Feb 27•11m read time•From medium.com

Table of contents

Introduction Background: Two Ways to Judge an L1 Model How We Structured the Investigation What We Ruled Out Quickly 1. Offline evaluation issues 2. Exposure bias and traffic share 3. Timeouts and serving failures Get Pinterest Engineering’s stories in your inbox Summary What Actually Broke: Features and Embeddings 1. Feature O/O discrepancy: training vs. serving 2. Embedding version skew: query vs. Pin Beyond Prediction: Funnel and Metric Effects 1. Funnel alignment 2. Metric mismatch Conclusion: O/O as a Design Constraint Acknowledgments

Comment

Bookmark

Copy

Sort: