ML-Life-Cycle-Modeling

Project Overview

This project is part of my eCornell Machine Learning portfolio and focuses on the modeling stage of the ML life cycle.
Using the Airbnb NYC Listings dataset, I trained and evaluated Decision Tree (DT) and K-Nearest Neighbors (KNN) classifiers, tuning their hyperparameters to identify the best‑performing model. The lab emphasizes feature engineering, model comparison, and evaluation metrics to guide predictive problem‑solving.

Objectives

Define the ML problem and prepare a feature matrix.
Perform one‑hot encoding for categorical variables.
Train and optimize Decision Tree and KNN models.
Compare model performance and select the optimal one.
Interpret results and discuss considerations for deployment.

Methodology

Data Preparation – Feature engineering, one‑hot encoding, and scaling where needed.
Model Training – Train DT and KNN classifiers with various hyperparameter configurations.
Evaluation – Measure performance using accuracy and visualizations.
Comparison – Analyze trade‑offs between DTs and KNNs to determine the most suitable model.
Conclusion – Summarize findings and the best‑performing configuration.

📂 Files in this Repository

CompareKNNsAndDTs.ipynb – Main Jupyter Notebook with full implementation, including training, evaluation, and comparison of DTs and KNNs.
CompareKNNsAndDTs.py – Script version of the notebook, containing the modeling logic for non‑notebook workflows.

Database used:

airbnbData_Prepared.csv - Final processed Airbnb NYC data, fully encoded and ready for model training.
airbnb_readytoOHE.csv - Preprocessed Airbnb NYC data ready for one-hot encoding.

Results & Key Findings

Both DT and KNN models were trained and tuned across multiple hyperparameters.
Accuracy varies with configuration; the best model (per notebook) outperforms baseline and highlights the bias–variance trade‑off.
Visualizations (accuracy vs. hyperparameter) help select robust configurations and reveal overfitting at extreme depths (DT) or inappropriate k values (KNN).

▶️ How to Run

Clone the repo:

git clone https://github.com/<your-username>/<this-repo>.git
cd <this-repo>

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Database		Database
CompareKNNsAndDTs (1).ipynb		CompareKNNsAndDTs (1).ipynb
CompareKNNsAndDTs (1).py		CompareKNNsAndDTs (1).py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML-Life-Cycle-Modeling

Project Overview

Objectives

Methodology

📂 Files in this Repository

Database used:

Results & Key Findings

▶️ How to Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ML-Life-Cycle-Modeling

Project Overview

Objectives

Methodology

📂 Files in this Repository

Database used:

Results & Key Findings

▶️ How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages