clone the repository
run in terminal: pip3 install -r requirements.txt
run in terminal: python3 main.py ['small', 'medium', 'large'] (pick one)
The accuracy of the model will be printed in the terminal, and the figures will open automatically.
Wildfires are increasing in frequency and impact, and early risk signals can help communities prepare sooner. This project was inspired by the need for a simple, practical way to estimate wildfire risk from accessible environmental indicators (temperature, humidity, wind, rainfall, and fire history). The goal was to build an understandable ML tool that can predict wildfire risks and hopefully prevent harm to communities.
Languages: Python Frameworks and Libraries:
- pandas
- numpy
- scikit-learn
- matplotlib and seaborn
Platforms: Local Python virtual environment
Tools: VS Code, Github, Copilot
Wildfire Risk Detection is a machine learning pipeline that predicts the probability of a wildfire occurrence based on weather and historical fire-related features.
Core functionality:
- Loads wildfire datasets (small/medium/large variants)
- Preprocesses data (including categorical Yes/No encoding)
- Trains a RandomForestClassifier
- Evaluates predictive performance on held-out test data
- Produces visual outputs, including: - feature importance - confusion matrix - predicted risk distribution
User experience: Simple script-based workflow (main.py) for end-to-end execution Interpretable charts that help users understand not only “how accurate” the model is, but also “why” it predicts risk and how the predictions land (e.g. incorrect predictions coming from false positives vs false negatives)
AI usage: Less than 70% of the code was AI-generated.
Data handling: pandas for loading and cleaning datasets Modeling: scikit-learn Random Forest for classification Pipeline structure: modular files for data prep, model training/evaluation, and visualization Visualization: matplotlib + seaborn for interpretable analytics plots
Finding usable data for this project was very difficult. I spent a lot of time searching for what I needed and was not able to find it. The closest I was able to find was the Australian Wildfires Data from 2016-2021 that was published by the Australian government in 2024, but even this did not suffice. In order to continue with this idea and keep working, I used sample data generated by Claude to train and test the model. The data is all included in the repo and can be checked out. In an ideal situation, I would have had real world data to use for this project, but this was not the case.
Built an end-to-end ML workflow from raw CSV to prediction and insights. Fixed real preprocessing issues that blocked model training. Added multiple useful visualizations beyond plain accuracy for better model interpretability. Kept the code modular and readable for future extension.