LoanDefaultRiceDatathon

Overview

This project was made for the 2022 Rice University Datathon, a 24 hour data science challenge in teams of up to 4. With my team, we worked on a dataset of loan defaults and experimented with EDA, creating visuals, predicting loan default, and predicting interest rate.

In this project, we thoroughly worked through the data sceince pipeline, working through the following steps:

Cleaning data
EDA
Creating visualizations
Feature engineering
Modeling
Model Evalutation

The skills and models that we used in this project were:

Logistic Regression
SMOTE oversampling
Linear Regression
Random Forrest Regression

Cleaning

In the cleaning section, we had to deal with missing values in 13 columns. To handle this, we worked to understand the columns with missing values, and appropriately filled with the necessary fields. In many cases, the missing values were supplements for 0, or for some field (like 'Accepted' for denial_reason).

Heatmap of missing values after Cleaning

EDA

For the EDA portion, we focused on deeply understanding and visualizing the data in creative and significant ways. We started from the most general and broad visualizations and focussed in on specific relationships. Below are the visualizations we created:

we started with broadly looking at all correlations

we then started to get more specific and focus on fewer relationships

we then took a sample of the entire data set and looked at individual relationships

Modeling

To model this data set, we first focused on predicting the acceptance of the loan.

We used logistic regression for this task, producing the following results:

Because of the imbalance in the categories, we utilized SMOTE to try to even the imbalance.

After this, we moved to predicting interest rate, first using linear regression. We achieved a testing error of .97 using this model.

We finished our project by attempting to utilize random forrest regression to predict the interest rate. However, we didn't achieve any results better than our linear model.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
LoanDataset (1).ipynb		LoanDataset (1).ipynb
README.md		README.md
heat_map_missing values.png		heat_map_missing values.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LoanDefaultRiceDatathon

Overview

Cleaning

EDA

Modeling

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LoanDefaultRiceDatathon

Overview

Cleaning

EDA

Modeling

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages