Authors: Prashanta Saha & Matt Wong
The main goal of this project is to predict the flight delays from the first 7 days of 2020
Midterm Project Submission File
These files were provided to us at the beginning of the project:
- exploratory_analysis.ipynb: The exploratory data analysis contains 10 questions that we needed to answer during the data exploration phase. It helped us get familiar with the variables and dataset.
- data_description.md: Contains the description of all variables across 4 tables in the dataset.
- modeling.ipynb: Contains the instructions for modeling parts of the project.
- sample_submission.csv: Example of the model that is to be submitted
These were the files we used to clean and manipulate data, and engineer features:
Data Cleaning and Feature Engineering.ipynb
These files contain the various models we used:
Base model with Linear Regression.ipynb
Optimized model with Linear Regression.ipynb
We were provided with 4 tables of data from the air travel industry
- flights
- passengers
- fuel_consumption
- flights_test