TEAM NAME ON CODALAB: MLPHANTS
Inspiration
Hi, we are MLPhants team! Thomas and I, Martina took up the challenge ‘Hack the Crash’ proposed by the QuantumBlack Team. We do not only love machine learning but we also love helping people out! We want to present our idea of solving this problem by embedding categorical variables into probabilistic distributions.
What it does
Our solution solves the problem presented to us by QuantumBlack team so classification of damage grades assigned to various buildings and areas. We used the idea of embedding each such feature as a low-dimensional vector.
How we built it
We created our model considering all examples in the dataset for which features have constant value and by looking at what percent of them falls into each damage level. This gave us representation telling that for example if a building is made of bamboo it has 15% chance to have damage 2 level after earthquake. Each such embedding is a very weak classification but we have many of them, so we could build strong classifiers from many weak ones. To do so we first performed principal component analysis to construct for us 75 new numerical features that explain 99.9% of variance. The rest of the work was done by gradient boosting algorithm that learns the effective classification. This embedding is very fast and consumes little computational power.
Challenges we ran into
Our main challenge was to properly understand the nuisances and links between data in the dataset given to us. Another challenge was selecting best ML techniques (and sometimes the choice is not as obvious as it seems!)
Accomplishments that we're proud of
We achieved very high score of our model (and several model iterations) and we were able to improve it several times in row.
What we learned
We learned a lot about ensembling models, data analysis, the importance of using PCA to understand the distribution of the data.
What's next for Hack The Crash
Obviously, our model by itself is not going be a self-sufficient solution to the given problem, but it surely has a lot of applications. Our model could investigate data points for which the classification was very wrong and spot which buildings seem to be safe, but in fact are not, and check if they share common features. This could help to discover some new weaknesses and improve possible building constructions in the future. This model could also be combined with a mobile app that has access to geo-spatial data and could warn a user of being in or near a building with high risk of damage in an earthquake (the alert would be for example triggered by the seismological data). The app could also show the user what are the safest buildings or areas within walking distance when the user could take cover. Other useful feature for such application would be a notification of current user’s location triggered by seismic shock which could potentially be helpful in case of a rescure operation. Additionally, our model could help insurance companies evaluate the risk of damaging buildings or properties based on their location, on the quality of adjacent buildings.
Built With
- boosting
- ensembling
- feature-engineering
- feature-selection
- jupyter
- machine-learning
- numpy
- pandas
- pca
- python
- scikit-learn
- xgboost


Log in or sign up for Devpost to join the conversation.