Uncommitted mapping updates should not efect existing indices#21306
Uncommitted mapping updates should not efect existing indices#21306bleskes merged 8 commits intoelastic:masterfrom
Conversation
jpountz
left a comment
There was a problem hiding this comment.
I left some comments. It is great to get the same behaviour regardless of whether the master node is a data node too!
| mapperService.merge(mapping.value.type(), mapping.value.source(), | ||
| MapperService.MergeReason.MAPPING_RECOVERY, request.updateAllTypes()); | ||
| } | ||
| indexMapperServices.put(index, mapperService); |
There was a problem hiding this comment.
should we put it in the map before we perform the merging, so that it is closed in the finally block if the merging fails too?
There was a problem hiding this comment.
good catch. will move
| String parentType = newMapper.parentFieldMapper().type(); | ||
| if (parentType.equals(mapping.value.type()) && | ||
| indexService.mapperService().getParentTypes().contains(parentType) == false) { | ||
| mapperService.getParentTypes().contains(parentType) == false) { |
There was a problem hiding this comment.
can you indent one level more? otherwise it's not obvious what belongs to the if statement and what belongs to the inner block
| return ClusterState.builder(currentState).metaData(builder).build(); | ||
| } else { | ||
| return currentState; | ||
| } |
There was a problem hiding this comment.
out of curiosity, does this optimization buy much or would eg. cluster state diffs notice that the mappings did not change?
There was a problem hiding this comment.
diffs will indeed optimize the network transmission time away but we still force a global sync of all nodes of the cluster - the master will publish this new state (with a 2 phase commit - so two rounds) and wait for the nodes to process it. This means that if some node is a bit busy it slows things down for nothing.
There was a problem hiding this comment.
++ I think this can buy use quite a fair bit of processing savings... yet, I wonder if we can improve the CS builder to detect this automatically (for sure not here)?
| searchOperationListeners, indexOperationListeners); | ||
| } | ||
|
|
||
| public MapperService newIndexMapperService(MapperRegistry mapperRegistry) throws IOException { |
There was a problem hiding this comment.
Can you add documentation that this mapper service may only be used fo administrative purposes, and not eg. actuallly parsing documents?
| indicesQueriesRegistry, clusterService, client, indicesQueryCache, mapperRegistry, indicesFieldDataCache); | ||
| } | ||
|
|
||
| public synchronized MapperService createIndexMapperService(IndexMetaData indexMetaData) throws IOException { |
|
thx @jpountz . I pushed another commit with all the feedback. Can you take another look? |
| */ | ||
| public MapperService newIndexMapperService(MapperRegistry mapperRegistry) throws IOException { | ||
| return new MapperService(indexSettings, analysisRegistry.build(indexSettings), | ||
| new SimilarityService(indexSettings, similarities), mapperRegistry, |
There was a problem hiding this comment.
I wonder if we should throw and AssertionError here instead... it should not happen and should not be caught?
There was a problem hiding this comment.
good question. I opted for UOE as otherwise I would have to either construct a QueryShardContext or return null that will explode later. I think this is the simplest?
There was a problem hiding this comment.
you answer doesn't make sense just use throw new AssertionError("no index query shard context available"); instead?
| indicesToClose.add(indexMetaData.getIndex()); | ||
| IndexService indexService = indicesService.createIndex(indexMetaData, Collections.emptyList()); | ||
| if (indexMapperServices.containsKey(indexMetaData.getIndex()) == false) { | ||
| MapperService mapperService = indicesService.createIndexMapperService(indexMetaData); |
There was a problem hiding this comment.
I must have missed it but don't we have to close this now? I don't see where it's closed... also on exception we should close all opened ones?
There was a problem hiding this comment.
it is closed in the finally block: IOUtils.close(indexMapperServices.values());
| final Index index = indexMetaData.getIndex(); | ||
| final Predicate<String> indexNameMatcher = (indexExpression) -> indexNameExpressionResolver.matchesIndex(index.getName(), indexExpression, clusterService.state()); | ||
| final IndexSettings idxSettings = new IndexSettings(indexMetaData, this.settings, indexNameMatcher, indexScopeSetting); | ||
| final IndexModule indexModule = new IndexModule(idxSettings, indexStoreConfig, analysisRegistry); |
There was a problem hiding this comment.
can we add a test that actually uses a custom field mapper and ensure that it's registered here?
s1monw
left a comment
There was a problem hiding this comment.
left two comments LGTM otherwise
| ClusterState result2 = mappingService.putMappingExecutor.execute(result, Collections.singletonList(request)) | ||
| .resultingState; | ||
|
|
||
| assertTrue(result == result2); |
| * Tests that teh {@link MapperService} created by {@link IndicesService#createIndexMapperService(IndexMetaData)} contains | ||
| * custom types and similarities registered by plugins | ||
| */ | ||
| public void testStandAloneMapperServiceWithPlugins() throws IOException { |
When processing a mapping updates, the master current creates an `IndexService` and uses its mapper service to do the hard work. However, if the master is also a data node and it already has an instance of `IndexService`, we currently reuse the the `MapperService` of that instance. Sadly, since mapping updates are change the in memory objects, this means that a mapping change that can rejected later on during cluster state publishing will leave a side effect on the index in question, bypassing the cluster state safety mechanism. This commit removes this optimization and replaces the `IndexService` creation with a direct creation of a `MapperService`. Also, this fixes an issue multiple from multiple shards for the same field caused unneeded cluster state publishing as the current code always created a new cluster state. This were discovered while researching #21189
When processing a mapping updates, the master current creates an
IndexServiceand uses its mapper service to do the hard work. However, if the master is also a data node and it already has an instance ofIndexService, we currently reuse the theMapperServiceof that instance. Sadly, since mapping updates are change the in memory objects, this means that a mapping change that can rejected later on during cluster state publishing will leave a side effect on the index in question, bypassing the cluster state safety mechanism.This commit removes this optimization and replaces the
IndexServicecreation with a direct creation of aMapperService.Also, this fixes an issue multiple from multiple shards for the same field caused unneeded cluster state publishing as the current code always created a new cluster state.
This were discovered while researching #21189