Towards Data Science is a community-powered publication that showcases work in data science, machine learning and artificial intelligence. Every day newcomers, seasoned researchers and industry practitioners publish tutorials, research notes and real-world case studies that help the field move forward.

Towards Data Science

Part 3 of a credit scoring model series focuses on data preprocessing: handling outliers and missing values in borrower data using Python. It covers creating a synthetic time variable for train/test/OOT splits, applying stratified splitting to preserve default rate and temporal structure, treating outliers with the IQR method and winsorization, and imputing missing values using conservative strategies (MAR vs MCAR mechanisms). All preprocessing parameters are computed on training data only and then applied to test and OOT sets to prevent data leakage.

Building Robust Credit Scoring Models (Part 3)