Conversation
|
@elasticmachine update branch |
DaveCTurner
left a comment
There was a problem hiding this comment.
Just a few small comments on the cluster state persistence changes
| private static final String DATA_FIELD_NAME = "data"; | ||
| private static final String GLOBAL_TYPE_NAME = "global"; | ||
| private static final String INDEX_TYPE_NAME = "index"; | ||
| private static final String MAPPING_TYPE_NAME = "mapping"; |
There was a problem hiding this comment.
Please remember to update the comment at the top describing the new schema :)
|
|
||
| Map<String, MappingMetadata> mappings = new HashMap<>(); | ||
| consumeFromType(searcher, MAPPING_TYPE_NAME, bytes -> { | ||
| MappingMetadata mappingMetadata = MappingMetadata.fromXContent(XContentFactory.xContent(XContentType.SMILE) |
There was a problem hiding this comment.
Could we include trace logging for loading each mapping too?
| if (indexUUIDs.add(indexMetadata.getIndexUUID()) == false) { | ||
| throw new IllegalStateException("duplicate metadata found for " + indexMetadata.getIndex() + " in [" + dataPath + "]"); | ||
| } | ||
| if (indexMetadata.mapping() != null && mappings.containsKey(indexMetadata.mapping().id())) { |
There was a problem hiding this comment.
Can we enforce some stronger invariant here? E.g. if the mapping has an ID then it must be in mappings? Maybe also that the rest of the mapping serialized with the index is empty?
It looks like this transparently detects when the mappings are the same. From the "outside" this is invisible. This makes me quite happy. |
Hash the mapping source of a MappingMetadata instance and then cache it in Metadata class. A mapping with the same hash will use a cached MappingMetadata instance. This can significantly reduce the number of MappingMetadata instances for data streams and index patterns. Idea originated from #69772, but just focusses on the jvm heap memory savings. And hashes the mapping instead of assigning it an uuid. Relates to #77466
Backporting elastic#80348 to 8.0 branch. Hash the mapping source of a MappingMetadata instance and then cache it in Metadata class. A mapping with the same hash will use a cached MappingMetadata instance. This can significantly reduce the number of MappingMetadata instances for data streams and index patterns. Idea originated from elastic#69772, but just focusses on the jvm heap memory savings. And hashes the mapping instead of assigning it an uuid. Relates to elastic#77466
Backporting #80348 to 8.0 branch. Hash the mapping source of a MappingMetadata instance and then cache it in Metadata class. A mapping with the same hash will use a cached MappingMetadata instance. This can significantly reduce the number of MappingMetadata instances for data streams and index patterns. Idea originated from #69772, but just focusses on the jvm heap memory savings. And hashes the mapping instead of assigning it an uuid. Relates to #77466
This change modifies cluster state to avoid duplication of mappings between indices. This helps when there are many indices sharing the same big mappings. This should lower memory consumption and speed up cluster state updates as there will be less things to write each time. Deduplication is done on 2 levels:
org.elasticsearch.cluster.metadata.Metadata.Builderwhich makes sure that the same mappings use the same instanceUsing following test on master vs this branch there's 2.34x reduction in cluster state size (1508kB vs 644kB):
Possible changes/improvements:
Metadatainstead of building it inMetadata.Builderso we don't have to rebuild it every time (it's rather quick process though)CompressedXContentso we can deduplicate mappings inIndexMetadataandIndexTemplateMetadata(not a big gain if there are many indices using the same template)