K-Boston Analysis

Final result of the machine learning algorithm. The 'X' marks where the next community center should be built

Inspiration

Oscar and I were inspired by the classic Boston housing prices data set. However, we recognized that this data set has an ethical issue where the surveyors assumed racial self-segregation had a positive impact on house prices. Due to this, we sought out the Climate Ready Boston Social Vulnerability dataset based on a 2016 survey.
Our kmeans clustering algorithm was inspired by an Astrophysics professor Oscar and I both had one semester. In that class, we used kmeans to cluster stars and other dense astrophysical data. When we reached out to a professor outside of the Astrophysics department, he suggested applying this kmeans clustering to the Boston Climate data set.

What it does

Our code first visualizes the data to understand what we are representing. We then produce a confusion matrix to determine which features we should train our model. Following this, we find the optimal number of clusters to run the model on. After having narrowed this down, we were able to use kmeans clustering to locate the areas in Boston in most need of additional infrastructure. 
With these cluster centers, we also plotted where current public schools and community centers are located to see which areas already have some infrastructure available to them. With this additional information, we could identify which areas were most needed with an easy-to-read graph.

How we built it

The Boston Climate Vulnerability Dataset has geographic data that corresponds to each neighborhood, along with the disadvantaged populations that live there. By mapping out the Polygon datasets into (x, y) coordinates, we ran a clustering algorithm in high dimensional space correlating all the features together to cluster the neighborhoods into disadvantaged areas. We reduced dimensionality with PCA, and used geopandas to deconstruct the .geojson files.

Challenges we ran into

Working with geojson files was something unfamiliar to both of us. Figuring out how to map the polygons of a geojson file to a traditional cartesian coordinate system of x and y was a challenging step along the way.

Accomplishments we’re proud of

We are proud that our code could address a real-world problem. This code can potentially make impacts that could last if taken seriously. Additionally, Oscar and I were able to problem-solve, communicate effectively, and work as a team to accomplish this code in a short amount of time. This speaks to our efficiency and communicative skills that we are both proud of.

Whats next

Our code can be used for any city, given sufficient data. We can expand our impacts to other regions and offer the tools to narrow down which areas might require additional support and resources. In the future, if additional surveys have more data representing a more comprehensive range of communities, we could further refine and narrow down which areas of a given region could most benefit.