Reuse MappingMetadata instances in Metadata class.#80348
Reuse MappingMetadata instances in Metadata class.#80348martijnvg merged 33 commits intoelastic:masterfrom
Conversation
Hash the mapping source of a MappingMetadata instance and then cache it in Metadata class. A mapping with the same hash will use a cached MappingMetadata instance. This can significantly reduce the number of MappingMetadata instances for data streams and index patterns. Relates to elastic#77466
|
Locally I have been testing this improvement with the script at the end of this comment. This script creates a data stream and rolls it over concurrently (1000 rollovers). With this change, the number of import threading
from elasticsearch import Elasticsearch
ds_base_name = 'logs-mysql-'
es = Elasticsearch(['http://localhost:9200'], http_auth=('elastic-admin', 'elastic-password'), timeout=600)
def rollover_test(ds_name):
es.indices.delete_data_stream(name=ds_name, ignore=[400, 404])
es.indices.create_data_stream(name=ds_name)
for i in range(200):
es.indices.rollover(alias=ds_name,wait_for_active_shards=0)
if __name__ == "__main__":
es.cluster.put_settings(body={"persistent": {"cluster.max_shards_per_node": 100000}})
es.indices.put_index_template(name='my-template', body={
"index_patterns": ["logs-*-*"],
"priority": 200,
"data_stream": {},
"composed_of": [
"logs-mappings",
"data-streams-mappings"
],
"template": {
"settings": {
"index.number_of_replicas": 0
}
},
"allow_auto_create": True
})
for i in range(5):
t = threading.Thread(target=rollover_test, args=(ds_base_name + str(i),))
t.start() |
c5b17aa to
bbd688a
Compare
causing inconsistencies.
|
Pinging @elastic/es-data-management (Team:Data Management) |
| } | ||
| } | ||
|
|
||
| private void cleanupUnusedEntry(IndexMetadata previous) { |
There was a problem hiding this comment.
Because the cache is passed down to new Metadata instances, a cleanup mechanism is needed.
This approach with counting the times a MappignMetata instance is used and removing entries when there are no usages seems cheaper compared to checking each metadata is build whether all entries are used and then removed the unused entries. But this approach is also more complex. I'm also okay to change the cleanup logic to be simpler and checking each cache entry whether it is used in any of the indices and if unused remove.
There was a problem hiding this comment.
If the simpler version doesn't come with problematic runtime overhead it might be fine? It seems like it should be cheap now that we can cheaply compare mappings?
There was a problem hiding this comment.
I think that is true. Also I find this implementation a bit fragile. I will change the cleanup implementation to just check for each cache entry whether it is used by any IndexMetadata.
original-brownbear
left a comment
There was a problem hiding this comment.
Thanks Martijn, this looks really nice and should be pretty impactful. I just have a couple of questions, suggestions on the plumbing around this mostly :) Happy to discuss this on another channel if that's quicker as well.
| String mappingAsString = Strings.toString((builder, params) -> builder.mapContents(mapping)); | ||
| try { | ||
| this.source = new CompressedXContent((builder, params) -> builder.mapContents(mapping)); | ||
| this.source = new CompressedXContent(mappingAsString); |
There was a problem hiding this comment.
Can't we use this.source = new CompressedXContent((builder, params) -> builder.mapContents(mapping)); here still and compute the digest as we compress for example since we stream the bytes in that case anyway?
There was a problem hiding this comment.
On that note, I wonder if we should just make the Sha256 value a part of the CompressedXContent? I see how we do some magic here to make sure we build a deterministic version based on a map, but we could have that by just adding a static constructor to this thing? For BwC we can compute the CRC or SHA1 dynamically where/if needed I guess.
Also, that would allow us to make the equals in CompressedXContent almost free by just turning it into a string comparison :) WDYT?
There was a problem hiding this comment.
I like this idea and in fact I did think about this while working on this. I will try this out and see whether it is not too difficult.
There was a problem hiding this comment.
I did the first part of this: c770523
I will work on making the sha256 really part of of CompressedXContent, just like crc32 checksum.
Just to double check, do we want to keep the crc32 checksum? I think so.
There was a problem hiding this comment.
Just to double check, do we want to keep the crc32 checksum? I think so.
Not sure tbh. It seems we're only using it to speed up comparisons and as a hashCode. Both of which we can have from the sha256 value can't we?
There was a problem hiding this comment.
Right, I thought there was another usage of the crc32, but it only used for hashcode and checking consistency in assertions. Both can be done with sha256 too. I will replace crc32 completely with sha256.
| Map<String, Object> routingNode = (Map<String, Object>) withoutType.get("_routing"); | ||
| for (Map.Entry<String, Object> entry : routingNode.entrySet()) { | ||
| String fieldName = entry.getKey(); | ||
| Map<?, ?> routingNode = (Map<?, ?>) withoutType.get("_routing"); |
There was a problem hiding this comment.
Why can't we have String keys here anymore?
There was a problem hiding this comment.
I was a bit annoyed with the SuppressWarnings("unchecked") and so removed it. I can undo this change and make this change in another pr.
There was a problem hiding this comment.
Maybe, this one is a little too impactful to have additional noise in it I think :)
There was a problem hiding this comment.
Ok, I will undo this bit here in the pr.
| } | ||
| } | ||
|
|
||
| private void cleanupUnusedEntry(IndexMetadata previous) { |
There was a problem hiding this comment.
If the simpler version doesn't come with problematic runtime overhead it might be fine? It seems like it should be cheap now that we can cheaply compare mappings?
| @@ -43,43 +43,43 @@ public final class CompressedXContent { | |||
| private static final ThreadLocal<InflaterAndBuffer> inflater1 = ThreadLocal.withInitial(InflaterAndBuffer::new); | |||
| private static final ThreadLocal<InflaterAndBuffer> inflater2 = ThreadLocal.withInitial(InflaterAndBuffer::new); | |||
There was a problem hiding this comment.
@martijnvg I just realized, we can probably remove all of this infrastructure here now that we have efficient equals? I added this only to speed up comparison a while back, but now it seems irrelevant and can be simplified?
There was a problem hiding this comment.
Yes, I've removed inflater2 and equalsWhenUncompressed() method: c05791c
inflater1 and a few other things are still in use by production code.
|
@elasticmachine run elasticsearch-ci/part-2 |
original-brownbear
left a comment
There was a problem hiding this comment.
LGTM, I think I'd like the index-metadata constructor to be set up differently just to enable follow-ups but it's your call, I can deal with that myself in a follow-up when I work on it again :)
Thanks so much for the iterations on this one!
| assert numberOfShards * routingFactor == routingNumShards : routingNumShards + " must be a multiple of " + numberOfShards; | ||
| } | ||
|
|
||
| IndexMetadata(IndexMetadata indexMetadata, MappingMetadata mapping) { |
There was a problem hiding this comment.
Could we maybe make this just a method:
IndexMetadata withMappingMetadata(MappingMetadata mapping) to give us an obvious copy constructor. I also wonder, even though it's slightly more code, maybe it's better if this would just invoke the existing constructor so we don't have two constructors to maintain for this thing (given how additional fields are incoming from settings moving to fields in this class)?
There was a problem hiding this comment.
👍 I agree this is cleaner. I've made this change: 7d09282
|
|
||
| private void purgeMappingCache(ImmutableOpenMap<String, IndexMetadata> indices) { | ||
| final var iterator = cache.entrySet().iterator(); | ||
| while (iterator.hasNext()) { |
💔 Backport failed
You can use sqren/backport to manually backport by running |
Backporting elastic#80348 to 8.0 branch. Hash the mapping source of a MappingMetadata instance and then cache it in Metadata class. A mapping with the same hash will use a cached MappingMetadata instance. This can significantly reduce the number of MappingMetadata instances for data streams and index patterns. Idea originated from elastic#69772, but just focusses on the jvm heap memory savings. And hashes the mapping instead of assigning it an uuid. Relates to elastic#77466
Backporting #80348 to 8.0 branch. Hash the mapping source of a MappingMetadata instance and then cache it in Metadata class. A mapping with the same hash will use a cached MappingMetadata instance. This can significantly reduce the number of MappingMetadata instances for data streams and index patterns. Idea originated from #69772, but just focusses on the jvm heap memory savings. And hashes the mapping instead of assigning it an uuid. Relates to #77466
Hash the mapping source of a MappingMetadata instance and then
cache it in Metadata class. A mapping with the same hash
will use a cached MappingMetadata instance. This can
significantly reduce the number of MappingMetadata instances
for data streams and index patterns.
Idea originated from #69772, but just focusses on the jvm heap memory savings.
And hashes the mapping instead of assigning it an uuid.
Relates to #77466