Inspiration

Loans can be useful for many situations in life such as purchasing a car or a home. As younger individuals with aspirations that may involve one day taking out a loan, we thought to ourselves how likely is it to be approved for a mortgage or loan, and what factors may make someone more or less likely to be approved? In recent years, and with the recent COVID pandemic, home and vehicle prices have risen to unprecedented levels that may put many individuals out of the range of being able to purchase a home or car.

** What it does**

Mortgage loan analysis, as the name suggests analysis various non-demographic attributes of a loan application’s attributes to compare if the applicant will be accepted or rejected. In this analysis, we don’t include the demographic data because usually, the applicant doesn’t have control over them which will lead to discriminatory bias in the application. Extracting the information from the data set, we trained our machine learning algorithm to establish a cross co-relation between accepted application, applicant age, income, debt-income ratio, loan amount, and the type of occupancy the applicant is interested in the property they are interested in.

How we built it

We picked 5 attributes from the Mortgage data set provided and created a separate *.csv file to avoid extra data loss from the null values of the attributes which we neglect in our model. We preprocessed the data to drop any null values of the applicants which might skew our datasets using the pandas library For the processing part, we had some classification data with controls interval We used Ordinal encoding to convert those into numeric discrete data for training and testing our model. We also had one, unique string data attribute, which was encoded using One-hot encoding to extract numeric values for processing. With this clean data, we divided the data into two groups, 80% for validation and 20%, and trained our model to establish a correlation between mortgage application acceptance.

Using Matlab plot, we carried out data/representation/ visualization and found out, other than debt-to-income ratio, there isn’t any significant co-relation between acceptance and other non-demographic factors After this visualization to establish our hypothesis, we trained our model using the data set we created., and evaluate the model we created we applied 4 types of algorithms to test it out: We used the Logistic Regression model to create a line the best fit for log-odds values to calculate the acceptance rate for the mortgage application. The F1 score, precision score, and recall score for this testing were very high, which suggested that the non-demographic factor which we accounted for didn’t have many roles in the application being accepted or rejected. Similarly, we carried out a random forest model, Decision Tree, and Support Vector machine algorithm and each of those evaluations had really high precision, recall, and F1 score supporting the evidence from data visualization.

Challenges we ran into

None of us had any prior experience in data analytics. We had a hard time trying to figure out the anaconda libraries and importing files into the Jupyter notebook. In response to that, we found a solution to collaborate on google collaborator. The data set which was provided has a lot of variables that we weren't familiar with and thus we did thorough research to really understand the data set. Since the data set was huge, the model took a really long time to process with the low memory devices we had. We knew the mathematical concepts around the model, but due to no previous experience, we struggled, in the beginning, to build an ML algorithm for it.

We barely had time to do our video presentation so that process was been rushed. We created a video under the pressure of 10 mins.

Accomplishments that we're proud of

In just the Span of less than 20 hours period, we dived into the core of ML and build a model that could predict the likelihood of mortgage acceptance rate based on non-demographic factors.

What we learned

To analyze several datasets using ML and collaborate with a team to optimize results

What's next for Mortage Loan Analysis

Loans can be useful for many situations in life such as purchasing a car or a home. As younger individuals with aspirations that may involve one day taking out a loan, we thought to ourselves how likely is it to be approved for a mortgage or loan, and what factors may make someone more or less likely to be approved? In recent years, and with the recent COVID pandemic, home and vehicle prices have risen to unprecedented levels that may put many individuals out of the range of being able to purchase a home or car.

Built With

  • anaconda
  • jupyter-notebook
  • python
Share this project:

Updates