A practical guide to robust variable selection for credit scoring models using a fold-based filter method. The approach splits training data into 4 stratified folds and applies four sequential rules: (1) Kruskal-Wallis test to drop continuous variables not linked to default, (2) Cramér's V to drop weak categorical variables, (3) Spearman correlation to remove redundant continuous variables, and (4) Cramér's V between categorical pairs to eliminate redundant categoricals. A variable is only kept if it passes all criteria on every single fold, ensuring stability rather than just performance on the full dataset. The method yields 7 final variables that are auditable, interpretable, and explainable to regulators.
Table of contents
The Core Idea: Stability Over PerformanceThe DatasetThe Filter Method: Four RulesImage CreditsReferencesSort: