Inspiration

This project was inspired by the increasing severity of air pollution and smog events in Pakistan. We aimed to build a data-driven system that can help analyze air quality patterns and support better environmental decision-making.

What We Learned

Through this project, we gained experience in working with real-world time-series environmental data. We learned data preprocessing techniques such as handling missing values, feature engineering with lag and rolling statistics, and applying machine learning models for anomaly detection. We also developed a better understanding of how different pollutants interact and affect AQI.

How We Built the Project

We built the solution using a structured data science pipeline. First, we cleaned and preprocessed the dataset by handling missing values and removing inconsistencies. Then, we created time-based features such as lag features and rolling means to capture temporal patterns.

We applied machine learning models, including Isolation Forest, to detect anomalies in air quality data. We also explored rule-based and statistical approaches for source classification of pollution events.

Built With

Share this project:

Updates