Statistically Improving Default Speed Assignments

# Whats the problem?

There are two fairly important considerations when building a routing engine. Path selection and estimated time of arrival for traveling that path. This issue is focused on the latter though in reality they are not independent problems.

If you take Valhalla, build a routing tileset from the OSM planet and run routes on that resulting tileset then you will notice an important deficiency. Currently, routing in certain locales is pretty inaccurate when it comes to the estimated time the route is going to take. I would say that it's especially deficient in urban and heavily trafficked locales. Basically the ETAs we provide out of the box are more akin to driving at night with no one else on the roads. This is unfortunate for 2 reasons.

1. Most people are driving during the day
2. Most people are concentrated in more heavily populated areas

Both of these mean that its statistically more likely that we'll get the kind of routing request that fairs poorly with our current default ETA (which does well at night or in rural areas with less road congestion).

# Why does this happen?

OSM is an amazing dataset without which Valhalla wouldn't exist (let alone countless other companies or divisions of companies). For the purposes of display and basic routing, the dataset has amazing value. But there are some problems. Mainly, we have to make some assumptions about the speed a motorvehicle will travel on a given edge in the graph. We use a few factors to make this assumption:

1. Road class (FRC from the highway tag)
2. Road use (FoW from various tags, roundabout, *_link, turn channel, driveway, track, ...)
2. Road density (urban vs rural)
3. Speed limit (if available)

The general flow of how a default speed gets assigned is:

1. `graph.lua` assigns a default speed to the way via the highway tag (FRC). for some special cases of low class road uses  (driveways and tracks) it sets even lower speeds
2. `mjolnir::PBFGraphParser` checks for other speed tags like speed limit, assumed speed, truck speed etc and stores those on the way as well
3. `mjolnir::GraphBuilder` builds the initial set of tiles and when it does so it sets the initial speed for an edge in the graph. in this case it prefers to use any tagged speed over the default speed we computed in graph.lua
4. `mjolnir::GraphEnhancer` then determines the road density around a given edge and uses this and other derived uses (turn channel etc) to further modify speed assigned to the edge (basically to reduce the speeds of urban edges and turn channels).

# What can we do about it?

## Make it configurable

What most people do about it is they ingest historical or live traffic data with a high enough prevalence that it outweighs the inaccuracies of the default speed assignments. I'd like to propose a different route (pun intended). I'd like to modify the graph enhancer to support 2 modes of operation:

**Normal Mode**: Work exactly as it does today by default. Changes to default speed are significant so its best not to rock anyone's boats by upending those if they don't opt into it.

**Configuration Mode**: Add a configuration option under the mjolnir section (blank by default, ie disabled) that points to another json file with detailed speed information. 

The speed information for configuration mode would take the following form:

```json
{
  "urban": {
    "classes": [
      {
        "ramp/tc": 30,
        "roundabout": 25,
        "none": 40
      }...
    ]
  },
  "rural": {
    "classes": [
      {
        "ramp/tc": 30,
        "roundabout": 25,
        "none": 40
      }...
    ]
  },
  "us": {
    "urban": {
      "classes": [
        {
          "ramp/tc": 30,
          "roundabout": 25,
          "none": 40
        }...
      ]
    },
    "rural": {
      "classes": [
        {
          "ramp/tc": 30,
          "roundabout": 25,
          "none": 40
        }...
      ]
    },
    "pa": {
      "urban": {
        "classes": [
          {
            "ramp/tc": 30,
            "roundabout": 25,
            "none": 40
          }...
        ]
      },
      "rural": {
        "classes": [
          {
            "ramp/tc": 30,
            "roundabout": 25,
            "none": 40
          }...
        ]
      }
    }
  }
}
```

This information is hierarchical but the basic premise is, given the characteristics of an edge that you have, use the closest match to those to set the speed. You'll notice locale information, urban/rural classification, FRC (class), FoW (use) and other attributes. Locale will serve as the basic hierarchical unit. If a local is missing you can pop up to a coarser locale to get the relevant information. We may even use speeds for other attributes where they are missing in our configuration.

## Gather some statistics

Once the configuration mode is fully supported we need to get some speeds from some where. For that we'll need to actually measure real drives to get some mean or median speeds for the given types of roads in the given locales.

The best way to do this is take a large corpus of GPS traces and use map matching to associate them to the graph. We'll want to do some filtering of the traces by time and by locations but also by looking at timestamps and speeds and noisiness so that we don't falsely color our metrics. An important one is doing mode detection (meaning we want to filter bike and walking traces and favor motor vehicle traces) which I think we can do by filtering traces that are continuously traveling at low speeds. We'll need to tweak this a bit for sure to find whats best. Once we have the set of GPS traces that we want to actually use, we can match them to the graph using the trace_attributes endpoint to get metadata information about each edge in the match including the attributes that we will break down our statistics by in the json format above but also, most importantly, the timing (and therefore the speed) on each edge.

So then the question becomes what corpus should we use. I first looked at https://www.openstreetmap.org/traces which is quite large, but the APIs to access those traces are also quite lacking. There were bulk dumps of global traces from about 9 years ago, which would suite our needs but frankly they are too old.

In thinking a bit more I released that another great source would be Mapillary. They have an open API (https://www.mapillary.com/developer/api-documentation/#sequences) that allows you to fetch traces and images for the purposes of improving OSM. The great news is that their API allows date ranges as well as geographic ranges so we can do small tests in different places and then scale up when we are ready to run the whole thing or slowly roll through it to keep the load low etc. In the end its a one time process every so often. Maybe we can even get their support?

The process should be basically a python script that interacts with the Mapillary API as well as our map matching API to map speeds to types of edges in different locales. At the very end of the process, after we've gathered our observations we can take the mean/median of those and create a json config file to be used as described above. Once we've completed a run for the planet the resulting json file can be committed back to this repository and also we can make an osm wiki page with these values (akin to other locale specific tagging suggestions and rules).

@ptpt @gyllen what do you guys think about this idea?

# Conclusion

This process should result in an artifact which we can publish to the OSM wiki and this repository which should allow other projects etc to improve OSM-based motor vehicle routing. I do not expect it to compete with actual traffic products from entrenched vendors since the amount of data coverage they have and the models they use apply at the individual edge level and are thus way more accurate. Here we are just trying to bring up the baseline a little bit so that ETAs and selected routes aren't off by 50%.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Statistically Improving Default Speed Assignments #3021

Whats the problem?

Why does this happen?

What can we do about it?

Make it configurable

Gather some statistics

Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Statistically Improving Default Speed Assignments #3021

Description

Whats the problem?

Why does this happen?

What can we do about it?

Make it configurable

Gather some statistics

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions