A walkthrough of exploratory data analysis (EDA) on a Kaggle credit scoring dataset with 32,581 observations and 12 variables. Each variable — including borrower age, income, employment length, home ownership, loan grade, interest rate, and prior default history — is analyzed for its distribution and relationship to default risk. Continuous variables are discretized into quartile-based intervals. Key findings include: younger and lower-income borrowers default more, prior default history is a strong predictor, and higher loan grades correlate with lower default rates. Python code using pandas is provided to automate the summary tables and export them to Excel.
Sort: