A walkthrough of exploratory data analysis (EDA) on a Kaggle credit scoring dataset with 32,581 observations and 12 variables. Each variable — including borrower age, income, employment length, home ownership, loan grade, interest rate, and prior default history — is analyzed for its distribution and relationship to default risk. Continuous variables are discretized into quartile-based intervals. Key findings include: younger and lower-income borrowers default more, prior default history is a strong predictor, and higher loan grades correlate with lower default rates. Python code using pandas is provided to automate the summary tables and export them to Excel.

18m read timeFrom towardsdatascience.com
Post cover image
Table of contents
Descriptive Statistics of the Modeling DatasetConclusionReferences

Sort: