A data science competition project from Virginia Tech that uses spatial and environmental data to predict the region where a rainfall-triggered landslide is likely to occur. Built using decision tree-based machine learning models.
In which region did a landslide occur given environmental conditions and specifications of the landslide?
The goal is to use features such as rainfall, location, and date to accurately classify landslide occurrences into distinct regions of risk.
- Name: Global Landslide Catalog (GLC)
- Source: NASA Open Data Portal
- Years Covered: 2007β2015
- Size: 6,788 rows Γ 35 columns
- Purpose: Identify rainfall-triggered landslides worldwide
- Achieved 68.5% accuracy on test data
- Hyperparameters:
ntree = 300,mtry = 18 - Further tuning did not yield significant improvement
- Achieved 69.5% accuracy without hyperparameter tuning
- Observed lower training error (0.02), but potential overfitting
- Landslide regions were defined using a 100-mile radius around events
- Formed clusters with at least 5 observations
- 86 distinct regions were identified as classification targets
- Most landslides occurred in July and August
- Contrast with expected months like March and April
- Training Data: 2007β2012 (4,138 observations)
- Testing Data: 2012β2015 (2,644 observations)
- Handle both discrete and continuous variables
- Perform well with high-dimensional spatial data (e.g., latitude and longitude)
- More robust against noise and overfitting compared to linear models
Hokie Hackers β Virginia Tech
- Ted Li
- Devanshu Khadka
- Drew Keely
- Nami Jain
Devanshu Khadka
LinkedIn
π§ khadkadevanshu@gmail.com
For academic use only. Contact authors for reuse or collaboration.