Skip to content

Uncommitted mapping updates should not efect existing indices#21306

Merged
bleskes merged 8 commits intoelastic:masterfrom
bleskes:mapping_dont_reuse_index_service
Nov 15, 2016
Merged

Uncommitted mapping updates should not efect existing indices#21306
bleskes merged 8 commits intoelastic:masterfrom
bleskes:mapping_dont_reuse_index_service

Conversation

@bleskes
Copy link
Copy Markdown
Contributor

@bleskes bleskes commented Nov 3, 2016

When processing a mapping updates, the master current creates an IndexService and uses its mapper service to do the hard work. However, if the master is also a data node and it already has an instance of IndexService, we currently reuse the the MapperService of that instance. Sadly, since mapping updates are change the in memory objects, this means that a mapping change that can rejected later on during cluster state publishing will leave a side effect on the index in question, bypassing the cluster state safety mechanism.

This commit removes this optimization and replaces the IndexService creation with a direct creation of a MapperService.

Also, this fixes an issue multiple from multiple shards for the same field caused unneeded cluster state publishing as the current code always created a new cluster state.

This were discovered while researching #21189

@bleskes bleskes added >bug :Search Foundations/Mapping Index mappings, including merging and defining field types :Cluster v6.0.0-alpha1 v5.1.1 labels Nov 3, 2016
Copy link
Copy Markdown
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments. It is great to get the same behaviour regardless of whether the master node is a data node too!

mapperService.merge(mapping.value.type(), mapping.value.source(),
MapperService.MergeReason.MAPPING_RECOVERY, request.updateAllTypes());
}
indexMapperServices.put(index, mapperService);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we put it in the map before we perform the merging, so that it is closed in the finally block if the merging fails too?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. will move

String parentType = newMapper.parentFieldMapper().type();
if (parentType.equals(mapping.value.type()) &&
indexService.mapperService().getParentTypes().contains(parentType) == false) {
mapperService.getParentTypes().contains(parentType) == false) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you indent one level more? otherwise it's not obvious what belongs to the if statement and what belongs to the inner block

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

return ClusterState.builder(currentState).metaData(builder).build();
} else {
return currentState;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, does this optimization buy much or would eg. cluster state diffs notice that the mappings did not change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diffs will indeed optimize the network transmission time away but we still force a global sync of all nodes of the cluster - the master will publish this new state (with a 2 phase commit - so two rounds) and wait for the nodes to process it. This means that if some node is a bit busy it slows things down for nothing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ I think this can buy use quite a fair bit of processing savings... yet, I wonder if we can improve the CS builder to detect this automatically (for sure not here)?

searchOperationListeners, indexOperationListeners);
}

public MapperService newIndexMapperService(MapperRegistry mapperRegistry) throws IOException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add documentation that this mapper service may only be used fo administrative purposes, and not eg. actuallly parsing documents?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. added.

indicesQueriesRegistry, clusterService, client, indicesQueryCache, mapperRegistry, indicesFieldDataCache);
}

public synchronized MapperService createIndexMapperService(IndexMetaData indexMetaData) throws IOException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add docs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@bleskes
Copy link
Copy Markdown
Contributor Author

bleskes commented Nov 3, 2016

thx @jpountz . I pushed another commit with all the feedback. Can you take another look?

*/
public MapperService newIndexMapperService(MapperRegistry mapperRegistry) throws IOException {
return new MapperService(indexSettings, analysisRegistry.build(indexSettings),
new SimilarityService(indexSettings, similarities), mapperRegistry,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should throw and AssertionError here instead... it should not happen and should not be caught?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question. I opted for UOE as otherwise I would have to either construct a QueryShardContext or return null that will explode later. I think this is the simplest?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you answer doesn't make sense just use throw new AssertionError("no index query shard context available"); instead?

Copy link
Copy Markdown
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some comments

indicesToClose.add(indexMetaData.getIndex());
IndexService indexService = indicesService.createIndex(indexMetaData, Collections.emptyList());
if (indexMapperServices.containsKey(indexMetaData.getIndex()) == false) {
MapperService mapperService = indicesService.createIndexMapperService(indexMetaData);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must have missed it but don't we have to close this now? I don't see where it's closed... also on exception we should close all opened ones?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is closed in the finally block: IOUtils.close(indexMapperServices.values());

final Index index = indexMetaData.getIndex();
final Predicate<String> indexNameMatcher = (indexExpression) -> indexNameExpressionResolver.matchesIndex(index.getName(), indexExpression, clusterService.state());
final IndexSettings idxSettings = new IndexSettings(indexMetaData, this.settings, indexNameMatcher, indexScopeSetting);
final IndexModule indexModule = new IndexModule(idxSettings, indexStoreConfig, analysisRegistry);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a test that actually uses a custom field mapper and ensure that it's registered here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test

@bleskes
Copy link
Copy Markdown
Contributor Author

bleskes commented Nov 13, 2016

@jpountz @s1monw thx for the feedback. I merged from master and added a test with plugins making sure custom fields and similarities are picked up. I tried to come up with the simplest test possible but please take a critical look.

Copy link
Copy Markdown
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left two comments LGTM otherwise

ClusterState result2 = mappingService.putMappingExecutor.execute(result, Collections.singletonList(request))
.resultingState;

assertTrue(result == result2);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertSame?

* Tests that teh {@link MapperService} created by {@link IndicesService#createIndexMapperService(IndexMetaData)} contains
* custom types and similarities registered by plugins
*/
public void testStandAloneMapperServiceWithPlugins() throws IOException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@bleskes
Copy link
Copy Markdown
Contributor Author

bleskes commented Nov 15, 2016

Thx @s1monw , @jpountz . I'll merge it into master and wait a day before back porting.

@bleskes bleskes merged commit 6d9af2f into elastic:master Nov 15, 2016
bleskes added a commit that referenced this pull request Nov 16, 2016
When processing a mapping updates, the master current creates an `IndexService` and uses its mapper service to do the hard work. However, if the master is also a data node and it already has an instance of `IndexService`, we currently reuse the the `MapperService` of that instance. Sadly, since mapping updates are change the in memory objects, this means that a mapping change that can rejected later on during cluster state publishing will leave a side effect on the index in question, bypassing the cluster state safety mechanism.

This commit removes this optimization and replaces the `IndexService` creation with a direct creation of a `MapperService`.

Also, this fixes an issue multiple from multiple shards for the same field caused unneeded cluster state publishing as the current code always created a new cluster state.

This were discovered while researching #21189
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Search Foundations/Mapping Index mappings, including merging and defining field types v5.1.1 v6.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants