Reuse MappingMetadata instances in Metadata class. by martijnvg · Pull Request #80348 · elastic/elasticsearch

martijnvg · 2021-11-04T14:49:33Z

Hash the mapping source of a MappingMetadata instance and then
cache it in Metadata class. A mapping with the same hash
will use a cached MappingMetadata instance. This can
significantly reduce the number of MappingMetadata instances
for data streams and index patterns.

Idea originated from #69772, but just focusses on the jvm heap memory savings.
And hashes the mapping instead of assigning it an uuid.

Relates to #77466

Hash the mapping source of a MappingMetadata instance and then cache it in Metadata class. A mapping with the same hash will use a cached MappingMetadata instance. This can significantly reduce the number of MappingMetadata instances for data streams and index patterns. Relates to elastic#77466

martijnvg · 2021-11-05T16:40:32Z

Locally I have been testing this improvement with the script at the end of this comment. This script creates a data stream and rolls it over concurrently (1000 rollovers). With this change, the number of MappingMetadata instances is 3 (running the script in en empty cluster) and without this change over 2000 instances.

import threading
from elasticsearch import Elasticsearch

ds_base_name = 'logs-mysql-'
es = Elasticsearch(['http://localhost:9200'], http_auth=('elastic-admin', 'elastic-password'), timeout=600)

def rollover_test(ds_name):
    es.indices.delete_data_stream(name=ds_name, ignore=[400, 404])
    es.indices.create_data_stream(name=ds_name)
    for i in range(200):
        es.indices.rollover(alias=ds_name,wait_for_active_shards=0)


if __name__ == "__main__":
    es.cluster.put_settings(body={"persistent": {"cluster.max_shards_per_node": 100000}})
    es.indices.put_index_template(name='my-template', body={
        "index_patterns": ["logs-*-*"],
        "priority": 200,
        "data_stream": {},
        "composed_of": [
            "logs-mappings",
            "data-streams-mappings"
        ],
        "template": {
            "settings": {
                "index.number_of_replicas": 0
            }
        },
        "allow_auto_create": True
    })

    for i in range(5):
        t = threading.Thread(target=rollover_test, args=(ds_base_name + str(i),))
        t.start()

causing inconsistencies.

elasticmachine · 2021-11-08T08:24:51Z

Pinging @elastic/es-data-management (Team:Data Management)

martijnvg · 2021-11-08T08:46:26Z

server/src/main/java/org/elasticsearch/cluster/metadata/Metadata.java

+            }
+        }
+
+        private void cleanupUnusedEntry(IndexMetadata previous) {


Because the cache is passed down to new Metadata instances, a cleanup mechanism is needed.
This approach with counting the times a MappignMetata instance is used and removing entries when there are no usages seems cheaper compared to checking each metadata is build whether all entries are used and then removed the unused entries. But this approach is also more complex. I'm also okay to change the cleanup logic to be simpler and checking each cache entry whether it is used in any of the indices and if unused remove.

If the simpler version doesn't come with problematic runtime overhead it might be fine? It seems like it should be cheap now that we can cheaply compare mappings?

I think that is true. Also I find this implementation a bit fragile. I will change the cleanup implementation to just check for each cache entry whether it is used by any IndexMetadata.

Implemented: 15b05af

original-brownbear

Thanks Martijn, this looks really nice and should be pretty impactful. I just have a couple of questions, suggestions on the plumbing around this mostly :) Happy to discuss this on another channel if that's quicker as well.

original-brownbear · 2021-11-08T11:28:49Z

server/src/main/java/org/elasticsearch/cluster/metadata/MappingMetadata.java

+        String mappingAsString = Strings.toString((builder, params) -> builder.mapContents(mapping));
        try {
-            this.source = new CompressedXContent((builder, params) -> builder.mapContents(mapping));
+            this.source = new CompressedXContent(mappingAsString);


Can't we use this.source = new CompressedXContent((builder, params) -> builder.mapContents(mapping)); here still and compute the digest as we compress for example since we stream the bytes in that case anyway?

On that note, I wonder if we should just make the Sha256 value a part of the CompressedXContent? I see how we do some magic here to make sure we build a deterministic version based on a map, but we could have that by just adding a static constructor to this thing? For BwC we can compute the CRC or SHA1 dynamically where/if needed I guess.
Also, that would allow us to make the equals in CompressedXContent almost free by just turning it into a string comparison :) WDYT?

I like this idea and in fact I did think about this while working on this. I will try this out and see whether it is not too difficult.

I did the first part of this: c770523
I will work on making the sha256 really part of of CompressedXContent, just like crc32 checksum.
Just to double check, do we want to keep the crc32 checksum? I think so.

Just to double check, do we want to keep the crc32 checksum? I think so.

Not sure tbh. It seems we're only using it to speed up comparisons and as a hashCode. Both of which we can have from the sha256 value can't we?

Right, I thought there was another usage of the crc32, but it only used for hashcode and checking consistency in assertions. Both can be done with sha256 too. I will replace crc32 completely with sha256.

I pushed: 1961453

original-brownbear · 2021-11-08T11:29:25Z

server/src/main/java/org/elasticsearch/cluster/metadata/MappingMetadata.java

-            Map<String, Object> routingNode = (Map<String, Object>) withoutType.get("_routing");
-            for (Map.Entry<String, Object> entry : routingNode.entrySet()) {
-                String fieldName = entry.getKey();
+            Map<?, ?> routingNode = (Map<?, ?>) withoutType.get("_routing");


Why can't we have String keys here anymore?

I was a bit annoyed with the SuppressWarnings("unchecked") and so removed it. I can undo this change and make this change in another pr.

Maybe, this one is a little too impactful to have additional noise in it I think :)

Ok, I will undo this bit here in the pr.

Removed this change: de25c02

original-brownbear · 2021-11-08T11:32:29Z

server/src/main/java/org/elasticsearch/cluster/metadata/Metadata.java

+            }
+        }
+
+        private void cleanupUnusedEntry(IndexMetadata previous) {


If the simpler version doesn't come with problematic runtime overhead it might be fine? It seems like it should be cheap now that we can cheaply compare mappings?

original-brownbear · 2021-11-11T16:55:17Z

server/src/main/java/org/elasticsearch/common/compress/CompressedXContent.java

@@ -43,43 +43,43 @@ public final class CompressedXContent {
    private static final ThreadLocal<InflaterAndBuffer> inflater1 = ThreadLocal.withInitial(InflaterAndBuffer::new);
    private static final ThreadLocal<InflaterAndBuffer> inflater2 = ThreadLocal.withInitial(InflaterAndBuffer::new);


@martijnvg I just realized, we can probably remove all of this infrastructure here now that we have efficient equals? I added this only to speed up comparison a while back, but now it seems irrelevant and can be simplified?

Yes, I've removed inflater2 and equalsWhenUncompressed() method: c05791c

inflater1 and a few other things are still in use by production code.

…ping metadata instance

martijnvg · 2021-11-23T11:36:50Z

@elasticmachine run elasticsearch-ci/part-2

original-brownbear

LGTM, I think I'd like the index-metadata constructor to be set up differently just to enable follow-ups but it's your call, I can deal with that myself in a follow-up when I work on it again :)
Thanks so much for the iterations on this one!

original-brownbear · 2021-11-24T15:48:20Z

server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java

        assert numberOfShards * routingFactor == routingNumShards : routingNumShards + " must be a multiple of " + numberOfShards;
    }

+    IndexMetadata(IndexMetadata indexMetadata, MappingMetadata mapping) {


Could we maybe make this just a method:

IndexMetadata withMappingMetadata(MappingMetadata mapping) to give us an obvious copy constructor. I also wonder, even though it's slightly more code, maybe it's better if this would just invoke the existing constructor so we don't have two constructors to maintain for this thing (given how additional fields are incoming from settings moving to fields in this class)?

👍 I agree this is cleaner. I've made this change: 7d09282

original-brownbear · 2021-11-24T15:49:12Z

server/src/main/java/org/elasticsearch/cluster/metadata/Metadata.java

+
+        private void purgeMappingCache(ImmutableOpenMap<String, IndexMetadata> indices) {
+            final var iterator = cache.entrySet().iterator();
+            while (iterator.hasNext()) {


elasticsearchmachine · 2021-11-25T09:50:41Z

💔 Backport failed

Status	Branch	Result
❌	8.0	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 80348

Backporting elastic#80348 to 8.0 branch. Hash the mapping source of a MappingMetadata instance and then cache it in Metadata class. A mapping with the same hash will use a cached MappingMetadata instance. This can significantly reduce the number of MappingMetadata instances for data streams and index patterns. Idea originated from elastic#69772, but just focusses on the jvm heap memory savings. And hashes the mapping instead of assigning it an uuid. Relates to elastic#77466

Backporting #80348 to 8.0 branch. Hash the mapping source of a MappingMetadata instance and then cache it in Metadata class. A mapping with the same hash will use a cached MappingMetadata instance. This can significantly reduce the number of MappingMetadata instances for data streams and index patterns. Idea originated from #69772, but just focusses on the jvm heap memory savings. And hashes the mapping instead of assigning it an uuid. Relates to #77466

elasticsearchmachine added the v8.1.0 label Nov 4, 2021

martijnvg added 2 commits November 5, 2021 10:02

rename

bbd688a

martijnvg force-pushed the reuse_mapping_metadata branch from c5b17aa to bbd688a Compare November 5, 2021 16:44

martijnvg added 15 commits November 5, 2021 19:47

Merge remote-tracking branch 'es/master' into reuse_mapping_metadata

ad5760d

missed a few places where mapping reuse was performed,

f31cbf8

causing inconsistencies.

cleanup

131fa72

more cleanup

9c31801

moar cleanup

8d787c5

Merge remote-tracking branch 'es/master' into reuse_mapping_metadata

6422393

adjust cleanup logic

6f57100

added unit test

283aca1

Merge remote-tracking branch 'es/master' into reuse_mapping_metadata

61dd8ca

undo unrelated change

7e6553a

digest can be null in case of older clusters

0572df1

fixed test

7592c32

fixed npe in mixed clusters

441db8a

compute digest in bwc case

2059cb7

iter

873517c

martijnvg requested a review from original-brownbear November 8, 2021 08:24

martijnvg marked this pull request as ready for review November 8, 2021 08:24

martijnvg added :Data Management/Indices APIs DO NOT USE. Use ":Distributed/Indices APIs" or ":StorageEngine/Templates" instead. v8.0.0 labels Nov 8, 2021

elasticmachine added the Team:Data Management (obsolete) DO NOT USE. This team no longer exists. label Nov 8, 2021

martijnvg added >enhancement and removed Team:Data Management (obsolete) DO NOT USE. This team no longer exists. labels Nov 8, 2021

martijnvg commented Nov 8, 2021

View reviewed changes

original-brownbear reviewed Nov 8, 2021

View reviewed changes

martijnvg added 2 commits November 11, 2021 16:59

rename

3bd8a65

undo unrelated change

de25c02

original-brownbear reviewed Nov 11, 2021

View reviewed changes

martijnvg added 4 commits November 12, 2021 00:07

base64 encode sha256 digest

17f00d9

removed equalsWhenUncompressed() method

c05791c

make cleanup unused entries less hard

3d2bf90

replace builder with a new copy constructor that only changes the map…

4bc2229

…ping metadata instance

martijnvg requested a review from original-brownbear November 12, 2021 12:08

Merge remote-tracking branch 'es/master' into reuse_mapping_metadata

0822c69

Merge remote-tracking branch 'es/master' into reuse_mapping_metadata

883d356

original-brownbear approved these changes Nov 24, 2021

View reviewed changes

martijnvg added 2 commits November 25, 2021 09:37

Merge remote-tracking branch 'es/master' into reuse_mapping_metadata

7f37cd3

Turn copy constructor into method (withMappingMetadata(...))

7d09282

martijnvg added the auto-backport-and-merge label Nov 25, 2021

martijnvg merged commit c67b470 into elastic:master Nov 25, 2021

martijnvg added the backport pending label Nov 25, 2021

martijnvg mentioned this pull request Nov 25, 2021

[8.0] Reuse MappingMetadata instances in Metadata class. #81036

Merged

martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Nov 25, 2021

Disable bwc tests in order to backport elastic#80348

75e0d1a

martijnvg added a commit that referenced this pull request Nov 25, 2021

Disable bwc tests in order to backport #80348 (#81046)

a139aff

martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Nov 25, 2021

Enable bwc tests after backport elastic#80348 to 8.0 branch.

a9825ed

martijnvg removed the backport pending label Nov 25, 2021

mark-vieira added v8.0.0-rc1 and removed v8.0.0 labels Jan 12, 2022

DaveCTurner mentioned this pull request Jan 12, 2026

[Cluster State] Combine Index Mappings when identical #140422

Closed

		@@ -43,43 +43,43 @@ public final class CompressedXContent {
		private static final ThreadLocal<InflaterAndBuffer> inflater1 = ThreadLocal.withInitial(InflaterAndBuffer::new);
		private static final ThreadLocal<InflaterAndBuffer> inflater2 = ThreadLocal.withInitial(InflaterAndBuffer::new);

Conversation

martijnvg commented Nov 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martijnvg commented Nov 5, 2021

Uh oh!

elasticmachine commented Nov 8, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

original-brownbear left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martijnvg commented Nov 23, 2021

Uh oh!

original-brownbear left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Nov 25, 2021

💔 Backport failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

martijnvg commented Nov 4, 2021 •

edited

Loading