- Top 30% among Break Through Tech program teams.
- Top 50% out of 800+ global submissions as of March 6th.
This project addresses the challenge of predicting the probability of a wildfire "hitting" a specific point of interest across four critical time horizons: 12h, 24h, 48h, and 72h.
The dataset provided unique challenges, including right-censored data (fires where the hit hadn't occurred by the end of the observation period). Our team tackled this by blending traditional classification ensembles with Survival Analysis.
My team and I brainstormed and implemented a suite of features to capture fire dynamics:
- Wavefront ETA: Combines radial growth rates with centroid velocity to estimate time of arrival.
- Near-Miss Margin: Calculates the geometric gap between the fire's projected radius and the point of interest.
- Threat Gravity: A momentum-based intensity metric normalized by distance.
We utilized an ensemble-of-ensembles approach:
- Gradient Boosting Ensemble: A blend of XGBoost, LightGBM, and CatBoost optimized via Optuna.
- Survival Component: A Random Survival Forest (RSF) to specifically model the time-to-event nature of fire spread.
- Final Blend: A 50/50 weighted average of the Boosting Ensemble and the RSF.
- 5-Fold Stratified Cross-Validation: Using Out-of-Fold (OOF) predictions to ensure unbiased evaluation.
-
Monotonicity Enforcement: Custom logic to ensure
$P(12h) \le P(24h) \le P(48h) \le P(72h)$ , reflecting the physical reality of cumulative probability.
- Localized Brier Score: 0.003
- Validation Performance: Significant error reduction achieved through blending diverse model architectures.