Skip to content

Devanshu1503/airport-delay-forecasting

Repository files navigation

✈️ Airport Congestion & Delay Forecasting

Time series forecasting project analyzing daily flight delay patterns and airport congestion across major U.S. airports.

This project models and predicts daily delay rates using historical flight data and statistical forecasting models. The goal is to better understand congestion dynamics and short-term disruption patterns in the airline network.


⚙️ Setup and Installation

This project requires Python 3.10+.

1) Create and activate a virtual environment

macOS / Linux

python3 -m venv .venv
source .venv/bin/activate

📦 Data Access

There is a Data/ folder in this repository containing partial data only.

The full dataset can be accessed here:

Full Data (Google Drive)
https://drive.google.com/drive/u/0/folders/1eyxYy6AeFmnw5Cr-WBU_vHM1vl78xwXZ

File Name Description Rows Size
US_Flight_Data_Slice.csv Partial dataset (first 200,000 rows) 200,000 19.1 MB
US_Flight_Data_Full.csv Full raw dataset 26,344,655 1.31 GB
flights_clean_filtered.csv Cleaned dataset filtered to Top 10 US carriers and Top 60 origin airports 18,075,343 1.29 GB
flights_clean_filtered_encoded.csv Same rows as flights_clean_filtered.csv, with one-hot encoded columns for "Airline Name" and "Origin" 18,075,343 7.9 GB

📊 Dataset Columns

The flights_clean_filtered.csv dataset contains the following variables:

Date
Carrier
Airline Name
Flight_Num
Origin
Dest
Delay
Cancelled
Dep_Time_HHMM
Actual_Dep_HHMM
Dep_Hour
Year
Month
Quarter
DayOfWeek

These features enable analysis of temporal delay patterns, airline behavior, and airport congestion dynamics.


⚠️ Encoding Recommendation

It is recommended to use:

flights_clean_filtered.csv

and perform encoding yourself using Sparse One-Hot Encoding, rather than relying on the pre-encoded CSV file.

The encoded dataset (flights_clean_filtered_encoded.csv) can become very large (≈8GB) and may consume unnecessary memory during modeling.


🧠 Sample Sparse Encoding Code

flights_encoded = pd.get_dummies(
    flights_clean_filtered,
    columns=["Airline Name", "Origin"],
    drop_first=True,
    sparse=True
)

Sparse encoding significantly reduces memory usage during modeling and avoids unnecessary file size inflation.


📂 Repository Structure

airport-delay-forecasting
│
├── Airport_Forecasting_EDA.ipynb
├── Airport_Forecasting_EDA.html
├── feature_engineering.ipynb
├── feature_engineering_airline.ipynb
├── modelling.ipynb
├── requirements.txt
└── README.md

👨‍💻 Authors

Devanshu Khadka
Luciano Handal Baracatt
Pareeyaporn Prachaseree

About

Time series forecasting of US airport delays using STL decomposition and SARIMA models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors