Loan Interest Rate Analysis

Summary

Our team chose the beginner track challenge and our goal is to build models that predict interest rates. We used four different models, linear regression, principal component regression (with data transformed into principal components), random forests, and neural networks. We learned a lot about data science and the different data science packages available. We also learn about ways to fine-tune hyperparameters for different machine learning algorithms using the existing functionalities from the sci-kit learn library.

Challenges we ran into

The challenge we encountered is the high dimensionality of this dataset. We did not encounter a dataset with this many features in our basic statistics class. In order to resolve this problem, we learned about dimensionality reduction approaches such as PCA and are able to test out whether or not dimensionality reduction can improve our prediction accuracy on the validation set.

What's next?

Explore hyperparameter tunings for the neural network approach. (Did not have time due to the time constraints) Further, investigate whether or not discrimination exists by performing EDA and data visualization.

Link to slides: https://docs.google.com/presentation/d/1kqBmwOBEYc2iXH_QnSG95T3nUHX7w7YeVV4neTz6Hco/edit#slide=id.g1111ab265f0_0_26