Crash Data for the City of Champaign & Crash Data Visualization API

NOTICE: Parts of documentation have been cut out or reformatted. Please go to the Documentation to view the cut out sections including more data on the API, Results, and Contact Information.

Introduction

Our team of hackers set out to make car crash data visualization more accessible. We used pre-existing data from the city of Champaign in calculations to generate statistics on crash data and a map representing crash concentration.

Our goal was to use open data for the City of Champaign to identify traffic accidents and provide tools for Data Analysis for users to visually see patterns or other data related to accidents. Our data analysis is intended to bring awareness to intersections with high rates of accidents so that any possible improvements could be made to reduce accidents in the future. We developed an extensible algorithm taking into account many different parameters which would return a score for each accident. Higher scores indicate higher rates of accidents, so intersections with the highest rates should be examined for any possible improvements to decrease accidents and increase the safety of drivers.

Higher scores are plotted with larger, red dots while lower scores are plotted with smaller, green dots.

Data was used from http://data.cuuats.org/datasets?q=champaign&sort_by=relevance

Brief Guide to the Algorithm

The algorithm has 3 main steps

1) Determine a score for each point, based on user inputs related to the circumstances of the accident. A higher score indicates there may be a problem with the intersection
2) Put each point into a cluster, if possible
3) For each point in a cluster, its score will be greatly modified with respect to the size of the cluster. Therefore, points in larger clusters will have much larger scores. This will bring attention to areas with high rates of accidents
4) Plot the data using gmplot (https://github.com/vgm64/gmplot)
5) Cache the plot to decrease loading times in the future

Detailed Guide to the Algorithm

The algorithm to determine the score is based on multiple data points provided by the data set, and modified and adjusted for human error or conditions. The data was provided in a format containing values for Crash Severity, Collision Type, Weather, Light, Road Conditions, Location, Fatalities, and Injuries, among other values. Because we are attempting to identify problems in specific roads, we want to reduce the effects of any external effects on drivers such as weather, light, and road conditions. In poor conditions, the score is adjusted to be lower based on either default or user-provided modifiers. The rationale for this adjustment is that poor conditions will result in more accidents, regardless of the specific intersection. Because we are looking for intersections to improve, we do not want to have external effects affecting the data.

For example, a crash at an intersection in daylight is usually less likely than a crash at the same place at night with snow. Therefore, if many crashes do occur at the intersection in daylight, it should have a much higher score.

The score for each point is determined by the formula

WeatherScore * WeatherModifier + LightScore * LightModifier + RoadConditionsScore * RoadConditionsModifier  + SeverityScore * SeverityCondition

All of these variables have default values, but can be modified by users.

After determining the score of each point, points are grouped into "clusters", representing points occurring at the same intersection or stretch of road. Points in clusters have their scores increasingly multiplied based on the amount of points in the cluster. This is done in order to identify areas with high accident rates. If one area has a cluster of many points, it is likely that it is a more dangerous intersection.

For example, if there is a cluster of 10 points, each point's individual score would be modified by

Number of Points * 2 = 20

Therefore, if the point initially had a score of 5, it would end up with a score of 100. This is intended to bring much more attention to clusters.

Caching

The analyzed data is cached and stored in the cloud. Because users can input their own values and modifiers, we developed an algorithm to cache the data based on their inputs so that they, or anybody else with the same inputs, can quickly access it later.

API

The application has an API which can be called from the website. There is a textarea to enter JSON text, which will be Posted to the server. The values from the data will then be used in place of the default values. Any values left unentered will be treated as default values.

After submitting the request, the inputs will be used to generate a new visualization of the data, and that visualization will then be cached in the cloud for future use for anybody who uses the same inputs again.

Default Values

These are the default values in the same format as a request must be.

{
    'WEATHER_DICTIONARY' : {
        "CLEAR" : 1.0,
        "CLOUDY/OVERCAST" : 1.0,
        "FOG/SMOKE/HAZE" : 0.4,
        "RAIN" : 0.3,
        "SEVERE CROSS WIND" : 0.8,
        "SLEET" : 0.25,
        "SNOW" : 0.2,
    },
    'LIGHT_DICTIONARY' : {
        'DARKNESS' : 0.6,
        'DARKNESS, LIGHTED ROAD' : 0.9,
        'DAWN' : 0.75,
        'DAYLIGHT' : 1.0,
        'DUSK' : 0.75
    },
    'ROAD_DICTIONARY' : {
        'DRY' : 1.0,
        'ICE' : 0.1,
        'SAND, MUD, DIRT' : 0.9,
        'SNOW OR SLUSH' : 0.2,
        'WET' : 0.3
    },
    'FATALITY_DICTIONARY' : {
        'A-INJURY' : 1.0,
        'FATAL' : 1.0
    },
    'MODIFIERS' : {
        'WEATHER' : 1.0,
        'LIGHT' : 1.0,
        'SEVERITY' : 1.0,
        'ROAD' : 1.0
    },
    'CLUSTERING' : {
        'MODE' : 1.0, # 1 for linear, otherwise exponential
        'BASE' : 2.0
    }
}

For example, if you wanted Fatal accidents to have a score of 2.0, have Wet road conditions have a score of 0.2, and leave everything else as default, this is the request you would submit

  {
      "FATALITY_DICTIONARY": {
          "FATAL" : 2.0
       },

       "ROAD_DICTIONARY": {
           "WET" : 0.2
       }
  }

The 'MODIFIERS' key involves modifiers that weigh each category by multiplying all points in the category. For example, if you wanted weather to be twice as significant, and crash severity to have no effect, your request would be

{
    "MODIFIERS" : {
         "WEATHER" : 2.0,
         "SEVERITY" : 0.0
     }
}

The 'CLUSTERING' key involves modifiers for the formula you want to use for clustering. As previously stated, the formula for clustering is NumberOfPoints * Base. You can modify the base, or change 'MODE' to any value other than 1 to use an exponential formula instead. The formula will then be Base ^ NumberOfPoints

Externally calling the API

The API can be called externally through a Post Request using AJAX. For example, here is a sample submission using JavaScript.

var request = {"CLUSTERING" : { "BASE" : 3.0} };
request = JSON.stringify(request);

$.post("https://champaign-amulamul.c9users.io/custom/views.custom",
    {
        api: request
    }
);

The server will then process the request and store the data from the request. Due to the nature of the data, no value is returned to the client.

Results

Based on the results of the data, there is a particularly large amount of accidents on two particular locations, the intersection of E Springfield Ave and S Randolph St & Bloomington Rd and N Prospect Ave, and several other minor influxes in the number of accidents at different locations. While no specific causes can be determined, the correlation between each of these points and a significant amounts of accidents may be an indication that improvements can be made to the current intersections.

Questions

If you have any questions, please email bakkejor[at]msu.edu and gillisa3[at]msu.edu

Built With

Share this project:

Updates