This project analyzes loan applications in Wisconsin for the year 2020, using the Home Mortgage Disclosure Act (HMDA) dataset. The goal is to explore lending patterns, identify potential discrimination, and practice Object-Oriented Programming (OOP) and data structures in Python.
Specifically, this project focuses on:
- Creating custom classes for
Applicant,Loan, andBankto efficiently handle loan data. - Implementing a Binary Search Tree (BST) for efficient loan lookups.
- Performing statistical analysis on interest rates, applicant demographics, and loan characteristics.
- Benchmarking BST performance versus naive approaches.
├── loan.py # Classes for Applicant, Loan, and Bank
├── search.py # Node and BST classes
├── mp3.ipynb # Notebook with analysis and questions
├── banks.json # Bank metadata
├── wi.zip # HMDA loan data (CSV inside)
└── README.md # Project documentation
Represents a loan applicant or co-applicant.
Attributes:
age(str): Age or age range of the applicant.race(set): Set of racial identities for the applicant.
Methods:
lower_age(): Returns the lower bound of the age range as an integer.__repr__(): String representation of the applicant.
Represents a single loan application.
Attributes:
loan_amount(float)property_value(float)interest_rate(float)applicants(list ofApplicantobjects)
Methods:
yearly_amounts(yearly_payment): Generator for yearly outstanding loan amounts.__str__()/__repr__(): Human-readable representation.
Represents a bank and its loans.
Attributes:
bank(str): Bank namelei(str): Legal Entity Identifierloan_list(list): AllLoanobjects for this bank
Special Methods:
__len__()→ Returns number of loans__getitem__(index)→ Enables indexing:bank[0]
Custom Binary Search Tree for storing loans keyed by interest rate.
Node Attributes:
key(float)values(list of Loan objects)left,right(Node)
BST Methods:
add(key, val): Adds a loan to the tree.__getitem__(key): Returns number of loans with the specified key.height(): Returns tree height.count_leaves(): Counts the number of leaf nodes.find_top_n(n): Returns top N interest rates.
-
Average Interest Rate
Calculated per bank, ignoring missing values. -
Applicants per Loan
Average number of applicants (applicant + co-applicant). -
Age Distribution
Frequency of applicants in each age bracket. -
BST Analysis
- Count missing interest rates without looping through all loans.
- Compute tree height and number of leaves.
- Efficient lookup for specific interest rates.
-
Performance Benchmarking
- Time to add first 15,000 loans to BST.
- Compare lookup time using BST vs naive iteration.
-
Racial Identity Distribution
- Bar chart showing the number of racial identities per applicant.
- Install required packages:
pip install matplotlib pandas- Run the notebook for analysis:
jupyter notebook mp3.ipynb