Uncommitted mapping updates should not efect existing indices by bleskes · Pull Request #21306 · elastic/elasticsearch

bleskes · 2016-11-03T13:30:19Z

When processing a mapping updates, the master current creates an IndexService and uses its mapper service to do the hard work. However, if the master is also a data node and it already has an instance of IndexService, we currently reuse the the MapperService of that instance. Sadly, since mapping updates are change the in memory objects, this means that a mapping change that can rejected later on during cluster state publishing will leave a side effect on the index in question, bypassing the cluster state safety mechanism.

This commit removes this optimization and replaces the IndexService creation with a direct creation of a MapperService.

Also, this fixes an issue multiple from multiple shards for the same field caused unneeded cluster state publishing as the current code always created a new cluster state.

This were discovered while researching #21189

jpountz

I left some comments. It is great to get the same behaviour regardless of whether the master node is a data node too!

jpountz · 2016-11-03T13:33:36Z

core/src/main/java/org/elasticsearch/cluster/metadata/MetaDataMappingService.java

+                                    mapperService.merge(mapping.value.type(), mapping.value.source(),
                                        MapperService.MergeReason.MAPPING_RECOVERY, request.updateAllTypes());
                                }
+                                indexMapperServices.put(index, mapperService);


should we put it in the map before we perform the merging, so that it is closed in the finally block if the merging fails too?

good catch. will move

jpountz · 2016-11-03T13:34:39Z

core/src/main/java/org/elasticsearch/cluster/metadata/MetaDataMappingService.java

                                String parentType = newMapper.parentFieldMapper().type();
                                if (parentType.equals(mapping.value.type()) &&
-                                        indexService.mapperService().getParentTypes().contains(parentType) == false) {
+                                    mapperService.getParentTypes().contains(parentType) == false) {


can you indent one level more? otherwise it's not obvious what belongs to the if statement and what belongs to the inner block

jpountz · 2016-11-03T13:35:44Z

core/src/main/java/org/elasticsearch/cluster/metadata/MetaDataMappingService.java

+                return ClusterState.builder(currentState).metaData(builder).build();
+            } else {
+                return currentState;
+            }


out of curiosity, does this optimization buy much or would eg. cluster state diffs notice that the mappings did not change?

diffs will indeed optimize the network transmission time away but we still force a global sync of all nodes of the cluster - the master will publish this new state (with a 2 phase commit - so two rounds) and wait for the nodes to process it. This means that if some node is a bit busy it slows things down for nothing.

++ I think this can buy use quite a fair bit of processing savings... yet, I wonder if we can improve the CS builder to detect this automatically (for sure not here)?

jpountz · 2016-11-03T13:36:15Z

core/src/main/java/org/elasticsearch/index/IndexModule.java

                searchOperationListeners, indexOperationListeners);
    }

+    public MapperService newIndexMapperService(MapperRegistry mapperRegistry) throws IOException {


Can you add documentation that this mapper service may only be used fo administrative purposes, and not eg. actuallly parsing documents?

yep. added.

jpountz · 2016-11-03T13:36:51Z

core/src/main/java/org/elasticsearch/indices/IndicesService.java

                indicesQueriesRegistry, clusterService, client, indicesQueryCache, mapperRegistry, indicesFieldDataCache);
    }

+    public synchronized MapperService createIndexMapperService(IndexMetaData indexMetaData) throws IOException {


bleskes · 2016-11-03T16:08:24Z

thx @jpountz . I pushed another commit with all the feedback. Can you take another look?

s1monw · 2016-11-03T16:23:06Z

core/src/main/java/org/elasticsearch/index/IndexModule.java

+     */
+    public MapperService newIndexMapperService(MapperRegistry mapperRegistry) throws IOException {
+        return new MapperService(indexSettings, analysisRegistry.build(indexSettings),
+            new SimilarityService(indexSettings, similarities), mapperRegistry,


I wonder if we should throw and AssertionError here instead... it should not happen and should not be caught?

good question. I opted for UOE as otherwise I would have to either construct a QueryShardContext or return null that will explode later. I think this is the simplest?

you answer doesn't make sense just use throw new AssertionError("no index query shard context available"); instead?

s1monw

left some comments

s1monw · 2016-11-03T16:25:11Z

core/src/main/java/org/elasticsearch/cluster/metadata/MetaDataMappingService.java

-                                indicesToClose.add(indexMetaData.getIndex());
-                                IndexService indexService = indicesService.createIndex(indexMetaData, Collections.emptyList());
+                            if (indexMapperServices.containsKey(indexMetaData.getIndex()) == false) {
+                                MapperService mapperService = indicesService.createIndexMapperService(indexMetaData);


I must have missed it but don't we have to close this now? I don't see where it's closed... also on exception we should close all opened ones?

it is closed in the finally block: IOUtils.close(indexMapperServices.values());

s1monw · 2016-11-03T16:26:12Z

core/src/main/java/org/elasticsearch/indices/IndicesService.java

+        final Index index = indexMetaData.getIndex();
+        final Predicate<String> indexNameMatcher = (indexExpression) -> indexNameExpressionResolver.matchesIndex(index.getName(), indexExpression, clusterService.state());
+        final IndexSettings idxSettings = new IndexSettings(indexMetaData, this.settings, indexNameMatcher, indexScopeSetting);
+        final IndexModule indexModule = new IndexModule(idxSettings, indexStoreConfig, analysisRegistry);


can we add a test that actually uses a custom field mapper and ensure that it's registered here?

I added a test

…e_index_service

bleskes · 2016-11-13T19:15:54Z

@jpountz @s1monw thx for the feedback. I merged from master and added a test with plugins making sure custom fields and similarities are picked up. I tried to come up with the simplest test possible but please take a critical look.

s1monw

left two comments LGTM otherwise

s1monw · 2016-11-14T15:29:42Z

core/src/test/java/org/elasticsearch/cluster/metadata/MetaDataMappingServiceTests.java

+        ClusterState result2 = mappingService.putMappingExecutor.execute(result, Collections.singletonList(request))
+            .resultingState;
+
+        assertTrue(result == result2);


assertSame?

s1monw · 2016-11-14T15:30:00Z

core/src/test/java/org/elasticsearch/indices/IndicesServiceTests.java

+     * Tests that teh {@link MapperService} created by {@link IndicesService#createIndexMapperService(IndexMetaData)} contains
+     * custom types and similarities registered by plugins
+     */
+    public void testStandAloneMapperServiceWithPlugins() throws IOException {


…e_index_service

bleskes · 2016-11-15T09:47:09Z

Thx @s1monw , @jpountz . I'll merge it into master and wait a day before back porting.

When processing a mapping updates, the master current creates an `IndexService` and uses its mapper service to do the hard work. However, if the master is also a data node and it already has an instance of `IndexService`, we currently reuse the the `MapperService` of that instance. Sadly, since mapping updates are change the in memory objects, this means that a mapping change that can rejected later on during cluster state publishing will leave a side effect on the index in question, bypassing the cluster state safety mechanism. This commit removes this optimization and replaces the `IndexService` creation with a direct creation of a `MapperService`. Also, this fixes an issue multiple from multiple shards for the same field caused unneeded cluster state publishing as the current code always created a new cluster state. This were discovered while researching #21189

bleskes added 3 commits November 3, 2016 10:38

WIP

2105fdc

fix RareClusterStateIT.java

ee6f2c8

add tests

014b7b9

bleskes added >bug :Search Foundations/Mapping Index mappings, including merging and defining field types :Cluster v6.0.0-alpha1 v5.1.1 labels Nov 3, 2016

jpountz approved these changes Nov 3, 2016

View reviewed changes

review feedback

f48cb3e

s1monw reviewed Nov 3, 2016

View reviewed changes

s1monw suggested changes Nov 3, 2016

View reviewed changes

bleskes added 2 commits November 13, 2016 14:47

Merge remote-tracking branch 'upstream/master' into mapping_dont_reus…

dc3ce4c

…e_index_service

add a test with plugins

0699098

s1monw approved these changes Nov 14, 2016

View reviewed changes

bleskes added 2 commits November 15, 2016 08:55

Merge remote-tracking branch 'upstream/master' into mapping_dont_reus…

7577ea1

…e_index_service

review feedback

f2259c3

bleskes merged commit 6d9af2f into elastic:master Nov 15, 2016

clintongormley removed the :Cluster label Nov 19, 2016

jonaf mentioned this pull request Mar 28, 2019

Improve indexing perf for indices with dynamic mappings and many aliases #40432

Closed

jonaf mentioned this pull request Apr 18, 2019

Metadata construction / put-mapping requests takes too long #38654

Closed

Conversation

bleskes commented Nov 3, 2016 • edited by clintongormley Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpountz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bleskes commented Nov 3, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

s1monw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bleskes commented Nov 13, 2016

Uh oh!

s1monw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bleskes commented Nov 15, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bleskes commented Nov 3, 2016 •

edited by clintongormley

Loading