De-Singleton tile_extract_t to Allow for Config (tileset) Reloading

At the moment graphreader uses a singleton especially for the most performant mode of operation, memory mapped tar file. This is done in large part because scanning and enumerating what is in the tar takes a decent amount of time if the file isnt already memory mapped. In this way the singleton is filling its typical role as a cache for other threads to make use of. This has a pretty steep down side though which is that once the extract is loaded its not possible to load another one. We've worked around this for unit tests by subclassing graphreader and adding methods to it to allow this: https://github.com/valhalla/valhalla/blob/916cea1454646917fd5c9fbc0c49087ae35208ce/test/test.cc#L404-L416

This solution is not scalable though in a multithreaded context so we must do something else to make this possible.

You'll notice that the actual thing we are talking about here is the `tile_extract_t` struct which houses both the extracts (graph tiles as well as speed tiles for real time speed). Both files reflect each other in terms of organization, a tar with tiles in it each of which has a file name that tells which tile it is. What costs time is running over the whole file to discover what and where tiles are located in the file. If getting this information were trivial the process could be sped up such that a singlton wouldnt be needed.

Many years ago we had such an implementation, not in valhalla but non-the-less for memory mapping a large file. in this implementation we passed an index which told where the bits were in the large file so that the program didnt need to scan it first. here we could do the same thing. pass an index with the tar that allows the `tile_extract_t` construct itself in very short time. The index would need a map, like that is found in the extract, of tile_id to byte_offset within the large tar file.

I think it would be nice not to require this index but to use it if its there. This means that configuring will take a long time in the case that it is not supplied but i dont see any other way to be lenient. the other option is to not be lenient and error when no index is provided.

The other question is how to provide the index. It would be really cool if we could make it the first file in the tar extract. this would require some trickery. i believe this could be done the following way:

1. create the tar archive as normal with all the files in it
2. enumerate the files and their byte offsets in an index file, format should be fixed sized struct {uint32_t tile_id, uint64_t byte_offset}
3. create a tar with only the index file in it
4. modify the index file by shifting all the offsets by the size of the tar with just the index file in it
5. recreate tar file with just the index in it (index is updated)
6. cat the larger tar with all the tiles in it onto the index only tar

Tars use separators between entries to mark the boundaries, using `cat` directly will make it so that the tar with just the index file will have what looks like an `end of archive` marker in it. valhalla is robust to this and will read past it to get the rest of the archive. if you want you could use `tar --concatinate` to avoid this problem but you'll also need to subtract the size of one tar block (512 bytes) from the byte offsets you put in the index

If we do this and we update tile_extract_t to read the index if its the first thing in the archive then we can assume that reloading takes only milliseconds and no longer requires a singleton to hold the memory map. I'm not sure if there is any issue with race conditions over creating the first memory map of a file but in the end threads should end up sharing the same map provided to them by the kernel. We should probably look into that, https://man7.org/linux/man-pages/man2/mmap.2.html seems to mention using `MAP_FIXED_NOREPLACE` potentially but i havent read it carefully. Reading a bit more i think the only thing that will happen is that different threads and processes will just get a different (virtual) address to the mapped memory but behind the scenes the kernel will map them to the same physical address in memory

	std::shared_ptr<valhalla::baldr::GraphReader>
	make_clean_graphreader(const boost::property_tree::ptree& mjolnir_conf) {

	// Wrapper sub-class to allow replacing the statically initialized
	// tile_extract member variable
	struct ResettingGraphReader : valhalla::baldr::GraphReader {
	ResettingGraphReader(const boost::property_tree::ptree& pt) : GraphReader(pt) {
	// Reset the statically initialized tile_extract_ member variable
	tile_extract_.reset(new valhalla::baldr::GraphReader::tile_extract_t(pt));
	}
	};
	return std::make_shared<ResettingGraphReader>(mjolnir_conf);
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

De-Singleton tile_extract_t to Allow for Config (tileset) Reloading #3117

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

De-Singleton tile_extract_t to Allow for Config (tileset) Reloading #3117

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions