-
Notifications
You must be signed in to change notification settings - Fork 879
De-Singleton tile_extract_t to Allow for Config (tileset) Reloading #3117
Description
At the moment graphreader uses a singleton especially for the most performant mode of operation, memory mapped tar file. This is done in large part because scanning and enumerating what is in the tar takes a decent amount of time if the file isnt already memory mapped. In this way the singleton is filling its typical role as a cache for other threads to make use of. This has a pretty steep down side though which is that once the extract is loaded its not possible to load another one. We've worked around this for unit tests by subclassing graphreader and adding methods to it to allow this:
Lines 404 to 416 in 916cea1
| std::shared_ptr<valhalla::baldr::GraphReader> | |
| make_clean_graphreader(const boost::property_tree::ptree& mjolnir_conf) { | |
| // Wrapper sub-class to allow replacing the statically initialized | |
| // tile_extract member variable | |
| struct ResettingGraphReader : valhalla::baldr::GraphReader { | |
| ResettingGraphReader(const boost::property_tree::ptree& pt) : GraphReader(pt) { | |
| // Reset the statically initialized tile_extract_ member variable | |
| tile_extract_.reset(new valhalla::baldr::GraphReader::tile_extract_t(pt)); | |
| } | |
| }; | |
| return std::make_shared<ResettingGraphReader>(mjolnir_conf); | |
| } |
This solution is not scalable though in a multithreaded context so we must do something else to make this possible.
You'll notice that the actual thing we are talking about here is the tile_extract_t struct which houses both the extracts (graph tiles as well as speed tiles for real time speed). Both files reflect each other in terms of organization, a tar with tiles in it each of which has a file name that tells which tile it is. What costs time is running over the whole file to discover what and where tiles are located in the file. If getting this information were trivial the process could be sped up such that a singlton wouldnt be needed.
Many years ago we had such an implementation, not in valhalla but non-the-less for memory mapping a large file. in this implementation we passed an index which told where the bits were in the large file so that the program didnt need to scan it first. here we could do the same thing. pass an index with the tar that allows the tile_extract_t construct itself in very short time. The index would need a map, like that is found in the extract, of tile_id to byte_offset within the large tar file.
I think it would be nice not to require this index but to use it if its there. This means that configuring will take a long time in the case that it is not supplied but i dont see any other way to be lenient. the other option is to not be lenient and error when no index is provided.
The other question is how to provide the index. It would be really cool if we could make it the first file in the tar extract. this would require some trickery. i believe this could be done the following way:
- create the tar archive as normal with all the files in it
- enumerate the files and their byte offsets in an index file, format should be fixed sized struct {uint32_t tile_id, uint64_t byte_offset}
- create a tar with only the index file in it
- modify the index file by shifting all the offsets by the size of the tar with just the index file in it
- recreate tar file with just the index in it (index is updated)
- cat the larger tar with all the tiles in it onto the index only tar
Tars use separators between entries to mark the boundaries, using cat directly will make it so that the tar with just the index file will have what looks like an end of archive marker in it. valhalla is robust to this and will read past it to get the rest of the archive. if you want you could use tar --concatinate to avoid this problem but you'll also need to subtract the size of one tar block (512 bytes) from the byte offsets you put in the index
If we do this and we update tile_extract_t to read the index if its the first thing in the archive then we can assume that reloading takes only milliseconds and no longer requires a singleton to hold the memory map. I'm not sure if there is any issue with race conditions over creating the first memory map of a file but in the end threads should end up sharing the same map provided to them by the kernel. We should probably look into that, https://man7.org/linux/man-pages/man2/mmap.2.html seems to mention using MAP_FIXED_NOREPLACE potentially but i havent read it carefully. Reading a bit more i think the only thing that will happen is that different threads and processes will just get a different (virtual) address to the mapped memory but behind the scenes the kernel will map them to the same physical address in memory