Part 3 of a credit scoring model series focuses on data preprocessing: handling outliers and missing values in borrower data using Python. It covers creating a synthetic time variable for train/test/OOT splits, applying stratified splitting to preserve default rate and temporal structure, treating outliers with the IQR method and winsorization, and imputing missing values using conservative strategies (MAR vs MCAR mechanisms). All preprocessing parameters are computed on training data only and then applied to test and OOT sets to prevent data leakage.
Sort: