Overview
Balancing between public health and normal life is a challenge policymakers have been facing during the COVID-19 pandemic for the past 2 years. This tool provides a reliable way to determine the level of public health measures required to curb the spread of COVID-19 based on a given country's past policies and their current public health situation (e.g. number of daily cases, vaccination rates, hospitalizations, pandemic-related deaths, etc.). Using ML and publicly available COVID-19 data for countries around the world, we are able to recommend a policy with up to 0.95 R-squared correlation through our interactive web app. Our project is available on GitHub: [https://github.com/paulhinta/csip]
Inspiration
The COVID-19 pandemic has affected us all in different ways. While many of us understand the need for social restrictions to be implemented to limit the effects of the pandemic, we have also felt the strain of social isolation. Adjusting to this new reality can be difficult. We wanted to create a tool that might help policymakers optimize their restrictions given an epidemiological situation, allowing them to avoid imposing measures when they may not be strictly necessary. In turn, this would benefit the mental health of all, while still protecting everyone from the virus as much as possible.
How we built it
Dataset Used
The dataset was taken from the GitHub repository of "Our World in Data" on January 22nd, 2022: [https://docs.github.com/en/github/writing-on-github/working-with-advanced-formatting/organizing-information-with-tables] The data was then cleaned and saved in the McHacks_ALLcovid.csv file with each row representing the datapoint of a certain day in a certain country from a list of 24 countries. The total number of datapoints is 14135. The desired output for this dataset is the Stringency Index, a measure that reflects the severity of covid-related restrictions in a certain country.
Data Preprocessing
The initial data table is first shuffled and the features are separated from the Stringency Index Output Values. All features related to an absolute number of individuals or tests, or vaccines is normalized through dividing by the population of the country.
ML Model
A random-forest decision tree model was adopted. Using 5-fold cross-validation, it was determined that such decision tree would consist of 200 estimators (trees) with a depth of 28. Using those parameters, the algorithm was able to generate predictions on the test data that had an R-Squared correlation of 0.951 with the actual test data labels. Work related to the training of such model can be found in the DecisionTree_RandomForest.ipynb notebook.
Challenges we ran into
At the start of the Hackathon, we were faced with difficulties selecting a project. We had many ideas, but we found it quite difficult to select just one. Eventually, we discovered the Our World in Data COVID-19 database, which allowed us to pursue the Stringency Index project. However, the data downloaded from the database did not come without its own challenges. The data was often sparse and inconsistent. Many hours of manual dataset preprocessing were necessary. Before settling on a Random Forest model, we also attempted to optimize a Multilayer Perceptron (MLP) model in parallel. Unfortunately, that model's performance continued to disappoint relative to that of the Random Forest despite hyperparameter tuning. Once the model had been developed and the frontend was ready, deploying the final application on Heroku also came with its challenges. This was especially due to our very limited experience in full-stack development; while we had used the tool in the past to deploy Flask or React apps online, this is only true for small-scale past projects. This weekend was the first time that we independently linked created a Flask API from scratch and linked it to a React front end as well as a Mongo database. However, we were unable to route the app properly on Heroku, and with the given time constraints we weren't able to successfully deploy.
Accomplishments that we're proud of
Through the difficulties in selecting a project and in developing our idea, we persevered and submitted a project that makes us all proud. We are particularly proud of our random forest implementation, as well as the way in which we connected our Python backend to the React frontend.
What we learned
It was our first time where we trained a model in a certain environment (Google Colaboratory), saved the parameters through pickling, and deployed this trained model in another environment (backend script). As hinted at above, we also learned how to develop and implement a random forest machine learning model.
What's next for COVID-19 Stringency Index Predictor
Next, we would like to optimize our data preprocessing to account for the lag between new cases and new hospitalizations. Often, these two values are correlated through a two-week delay. We would also like to further polish our frontend user interface to improve the user experience.

Log in or sign up for Devpost to join the conversation.