Calendar-based MLOps retraining schedules assume models decay smoothly like the Ebbinghaus forgetting curve, but production data tells a different story. Fitting an exponential decay model to 555,000 fraud transactions returned R² = −0.31, worse than predicting the mean, revealing that fraud detection models fail in sudden shocks rather than gradual decline. A two-regime classification framework is introduced: 'smooth' (R² ≥ 0.4, where scheduled retraining works) and 'episodic' (R² < 0.4, where event-driven shock detection is needed). Practical Python code is provided for diagnosing which regime a model is in and implementing shock detection mechanisms including rolling-mean drop alerts, volume-weighted recall, and two-consecutive-week retrain triggers.

17m read timeFrom towardsdatascience.com
Post cover image
Table of contents
TL;DRIt Started With One WeekThe Assumption Nobody QuestionsThe Experiment26 Weeks of Production PerformanceWhat −0.31 Actually MeansTwo Regimes of Model ForgettingWhy Fraud Detection Is Always EpisodicThe Diagnostic FrameworkThe Full Diagnostic ReportOn the Choice of Baseline MethodWhat This Means for MLOps PracticeLimitationsReproducing This AnalysisConclusionDisclosureReferences

Sort: