Time series forecasting project analyzing daily flight delay patterns and airport congestion across major U.S. airports.
This project models and predicts daily delay rates using historical flight data and statistical forecasting models. The goal is to better understand congestion dynamics and short-term disruption patterns in the airline network.
This project requires Python 3.10+.
python3 -m venv .venv
source .venv/bin/activateThere is a Data/ folder in this repository containing partial data only.
The full dataset can be accessed here:
Full Data (Google Drive)
https://drive.google.com/drive/u/0/folders/1eyxYy6AeFmnw5Cr-WBU_vHM1vl78xwXZ
| File Name | Description | Rows | Size |
|---|---|---|---|
US_Flight_Data_Slice.csv |
Partial dataset (first 200,000 rows) | 200,000 | 19.1 MB |
US_Flight_Data_Full.csv |
Full raw dataset | 26,344,655 | 1.31 GB |
flights_clean_filtered.csv |
Cleaned dataset filtered to Top 10 US carriers and Top 60 origin airports | 18,075,343 | 1.29 GB |
flights_clean_filtered_encoded.csv |
Same rows as flights_clean_filtered.csv, with one-hot encoded columns for "Airline Name" and "Origin" |
18,075,343 | 7.9 GB |
The flights_clean_filtered.csv dataset contains the following variables:
Date
Carrier
Airline Name
Flight_Num
Origin
Dest
Delay
Cancelled
Dep_Time_HHMM
Actual_Dep_HHMM
Dep_Hour
Year
Month
Quarter
DayOfWeek
These features enable analysis of temporal delay patterns, airline behavior, and airport congestion dynamics.
It is recommended to use:
flights_clean_filtered.csv
and perform encoding yourself using Sparse One-Hot Encoding, rather than relying on the pre-encoded CSV file.
The encoded dataset (flights_clean_filtered_encoded.csv) can become very large (≈8GB) and may consume unnecessary memory during modeling.
flights_encoded = pd.get_dummies(
flights_clean_filtered,
columns=["Airline Name", "Origin"],
drop_first=True,
sparse=True
)Sparse encoding significantly reduces memory usage during modeling and avoids unnecessary file size inflation.
airport-delay-forecasting
│
├── Airport_Forecasting_EDA.ipynb
├── Airport_Forecasting_EDA.html
├── feature_engineering.ipynb
├── feature_engineering_airline.ipynb
├── modelling.ipynb
├── requirements.txt
└── README.md
Devanshu Khadka
Luciano Handal Baracatt
Pareeyaporn Prachaseree