Inspiration

The topics that fascinated our group were the healthcare topics that have an impact in our society. The perfect opportunity to explore the field of data science was through the Beginners Track Challenge.

What it does

We identified characteristics of facilities where putting funds would make the most impact with two datasets containing age, sex, and race to identify the locations where opening facilities would make the most sense.

How we built it

We utilized the pandas, numpy, scikit-learn, and plotly libraries within python to implement our project. We organized and cleaned the datasets provided to us, implemented data visualization techniques for facilities in each state, and built a machine learning model – linear regression – predicting the population per facility for each state by 2025.

Challenges we ran into

One of the datasets was extremely unorganized so we had to put a lot of effort into cleaning, sorting, and filtering the data. In addition, creating a model with linear regression took time because of data extraction and wrangling.

Accomplishments that we're proud of

We were able to successfully complete the project with aesthetic and illuminating visualizations. In addition, we were satisfied with the way we designed the machine learning model.

What we learned

We learned about how to construct a choropleth plot using the plotly libraries and GeoJSON along with how to de-pivot a pandas dataframe. We discovered and invented a function to convert plotly graphs to png files.

What's next for Beginners Track Challenge

We will apply the techniques learned through this challenge in our future endeavors of data analysis.

Share this project:

Updates