More Compact Serialization of Metadata#82608
Merged
original-brownbear merged 1 commit intoelastic:masterfrom Jan 14, 2022
original-brownbear:efficient-serialization-metadata-over-wire
Merged
More Compact Serialization of Metadata#82608original-brownbear merged 1 commit intoelastic:masterfrom original-brownbear:efficient-serialization-metadata-over-wire
original-brownbear merged 1 commit intoelastic:masterfrom
original-brownbear:efficient-serialization-metadata-over-wire
Conversation
Serialize the map of hashes to mappings and then lookup from the map instead of serializing them over and over for each index to make full cluster state transport messages much smaller in the common case of many duplicate mappings.
Collaborator
|
Pinging @elastic/es-distributed (Team:Distributed) |
idegtiarenko
approved these changes
Jan 14, 2022
arteam
reviewed
Jan 14, 2022
| if (in.getVersion().onOrAfter(MAPPINGS_AS_HASH_VERSION)) { | ||
| final int mappings = in.readVInt(); | ||
| if (mappings > 0) { | ||
| final Map<String, MappingMetadata> mappingMetadataMap = new HashMap<>(mappings); |
Contributor
There was a problem hiding this comment.
The HashMap constructors accepts the capacity, not the expected amount of elements. It needs to be sized a bit higher than mappings, otherwise it will need to be resized/rehashed.
See https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/Maps.java#L273
Contributor
Author
There was a problem hiding this comment.
True, though I guess it might be worthwhile to have a general fix to this. We seem to always pre-size capacity == element count in deserialization. Technically, we probably could move to accounting for the load factor, but I wouldn't expect too much from it (especially when the key's hashcode is essentially free).
Contributor
Author
|
Thanks Ievgen! |
97 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Serialize the map of hashes to mappings and then lookup from the map instead
of serializing them over and over for each index to make full cluster state
transport messages much smaller in the common case of many duplicate mappings.
This should make the master node impact of requests for the full cluster state (or at least the state including mappings) quite a bit cheaper memory+cpu+network wise. Also it saves lots of buffers on the coordinating/sending node as well as CPU for deduplicating mappings.
relates #77466