Skip to content

Propagate MapperBuilderContext across merge calls#86946

Merged
javanna merged 4 commits intoelastic:masterfrom
javanna:fix/mapper_merge_dots
May 20, 2022
Merged

Propagate MapperBuilderContext across merge calls#86946
javanna merged 4 commits intoelastic:masterfrom
javanna:fix/mapper_merge_dots

Conversation

@javanna
Copy link
Copy Markdown
Contributor

@javanna javanna commented May 19, 2022

With #86166 we have added support for leaf fields that have dots in their names, without needing to expand their path to intermediate objects. That means that for instance a host.name keyword field mapper can be held directly by the root object mapper if the root has subobjects set to false.

This opens up for situations where a field name containing dots can't necessarily be split into a leaf name and a parent path made of intermediate object mappers. In the field mapper merge code, we build the full path of the merged field making that now incorrect assumption, which causes the result of merged leaf fields to have incorrect full path. This is a bug although it got unnoticed so far as the full path of a field mapper is not so widely used compared to its leaf name (returned by the simpleName method). One place where the full path is used is when sorting the child mappers in ObjectMapper#serializeMappers which was causing MapperService#assertRefreshIsNotNeeded to trip as the result of the merge has fields in a different order compared to the incoming mappings, due to the different name between the original mapper and the merged mapper.

With this fix, we add a MapperBuilderContext argument to the different merge methods, propagate the MapperBuilderContext in all the merge calls and create a child context when needed, meaning whenever merging object mappers, so that their children mappers are created with the proper full path.

With elastic#86166 we have added support for leaf fields that have dots in their names, without needing to expand their path to intermediate objects. That means that for instance a host.name keyword field mapper can be held directly by the root object mapper if the root has subobjects set to false.

This opens up for situations where a field name containing dots can't necessarily be split into a leaf name and a parent path made of intermediate object mappers. In the field mapper merge code, we build the full path of the merged field making that now incorrect assumption, which causes the result of merged leaf fields to have incorrect full path. This is a bug although it got unnoticed so far as the full path of a field mapper is not so widely used compared to its leaf name (returned by the simpleName method). One place where the full path is used is when sorting the child mappers in ObjectMapper#serializeMappers which was causing MapperService#assertRefreshIsNotNeeded to trip as the result of the merge has fields in a different order compared to the incoming mappings.

With this fix, we add a MapperBuilderContext argument to the different merge methods, propagate the MapperBuilderContext in all the merge calls and create a child context when needed, meaning whenever merging object mappers, so that their children mappers are created with the proper full path.
@javanna javanna added >non-issue :Search Foundations/Mapping Index mappings, including merging and defining field types v8.3.0 labels May 19, 2022
@javanna javanna requested a review from romseygeek May 19, 2022 22:34
@elasticmachine elasticmachine added the Team:Search Meta label for search team label May 19, 2022
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search (Team:Search)


public abstract ContentPath path();

public final MapperBuilderContext createMapperBuilderContext() {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not perfect, but more isolated now to the only place where it's needed. I am not entirely sure if we can remove it from here hence I removed the TODO.


@Override
public final FieldMapper merge(Mapper mergeWith) {
public final FieldMapper merge(Mapper mergeWith, MapperBuilderContext mapperBuilderContext) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went for propagating the MapperBuilderContext instead of adding a String path argument, because in some cases we already have a mapper builder context instance that we can just propagate, and in general it seems like a good idea to start merging at the root level with a ROOT mapper builder context and then build a child context for each level, in the caller of the merge methods.

assertEquals("host.name", textFieldMapper.simpleName());
FieldMapper fieldMapper = textFieldMapper.multiFields.iterator().next();
assertEquals("foo.metrics.host.name.keyword", fieldMapper.name());
assertEquals("keyword", fieldMapper.simpleName());
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too sure that these additional merge tests belong to this class. I don't know whether there's an existing place for specific merge tests. Also, these are weird merge tests because they don't really test merging different mapper but rather merge the same mappers and verify the resulting simpleName and name. I can see how that will look weird in a couple of years: "why did we write these tests again?". Thoughts?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment to the effect that we're testing that leaf fields with dots in their names merge correctly?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really I think we should rewrite all these tests so that they extend MapperServiceTestCase and test the actual merge paths that MapperService calls, but let's do that as a followup.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, do they now test a different codepath? I mean they call root.merge directly, instead of going through mapper service, is that what you mean?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean they call root.merge directly, instead of going through mapper service, is that what you mean?

Exactly

Copy link
Copy Markdown
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. If we ever manage to move merging onto the Builders then I think we can remove a lot of this again, but for now we're stuck with it.

assertEquals("host.name", textFieldMapper.simpleName());
FieldMapper fieldMapper = textFieldMapper.multiFields.iterator().next();
assertEquals("foo.metrics.host.name.keyword", fieldMapper.name());
assertEquals("keyword", fieldMapper.simpleName());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment to the effect that we're testing that leaf fields with dots in their names merge correctly?

assertEquals("host.name", textFieldMapper.simpleName());
FieldMapper fieldMapper = textFieldMapper.multiFields.iterator().next();
assertEquals("foo.metrics.host.name.keyword", fieldMapper.name());
assertEquals("keyword", fieldMapper.simpleName());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really I think we should rewrite all these tests so that they extend MapperServiceTestCase and test the actual merge paths that MapperService calls, but let's do that as a followup.

@javanna javanna merged commit 095d1fc into elastic:master May 20, 2022
javanna added a commit to javanna/elasticsearch that referenced this pull request Aug 17, 2022
…c#89389)

When parsing mappings, we may find a field with same name specified twice, either
because JSON duplicate keys are allowed, or because a mix of object notation and dot
notation is used when providing mappings. The same can happen when applying dynamic
mappings as part of parsing an incoming document, as well as when merging separate
index templates that may contain the definition for the same field using a
mix of object notation and dot notation.

While we propagate the MapperBuilderContext across merge calls thanks to elastic#86946, we do not
propagate the right context when we call merge on objects as part of parsing/building
mappings. This causes a situation in which the leaf fields that result from the merge
have the wrong path, which misses the first portion e.g. sub.field instead of obj.sub.field.

This commit applies the correct mapper builder context when building the object mapper builder
and two objects with same name are found.

Relates to elastic#86946

Closes elastic#88573
elasticmachine pushed a commit to fcofdez/elasticsearch that referenced this pull request Aug 17, 2022
…c#89389)

When parsing mappings, we may find a field with same name specified twice, either
because JSON duplicate keys are allowed, or because a mix of object notation and dot
notation is used when providing mappings. The same can happen when applying dynamic
mappings as part of parsing an incoming document, as well as when merging separate
index templates that may contain the definition for the same field using a
mix of object notation and dot notation.

While we propagate the MapperBuilderContext across merge calls thanks to elastic#86946, we do not
propagate the right context when we call merge on objects as part of parsing/building
mappings. This causes a situation in which the leaf fields that result from the merge
have the wrong path, which misses the first portion e.g. sub.field instead of obj.sub.field.

This commit applies the correct mapper builder context when building the object mapper builder
and two objects with same name are found.

Relates to elastic#86946

Closes elastic#88573
grcevski pushed a commit to grcevski/elasticsearch that referenced this pull request Aug 17, 2022
…c#89389)

When parsing mappings, we may find a field with same name specified twice, either
because JSON duplicate keys are allowed, or because a mix of object notation and dot
notation is used when providing mappings. The same can happen when applying dynamic
mappings as part of parsing an incoming document, as well as when merging separate
index templates that may contain the definition for the same field using a
mix of object notation and dot notation.

While we propagate the MapperBuilderContext across merge calls thanks to elastic#86946, we do not
propagate the right context when we call merge on objects as part of parsing/building
mappings. This causes a situation in which the leaf fields that result from the merge
have the wrong path, which misses the first portion e.g. sub.field instead of obj.sub.field.

This commit applies the correct mapper builder context when building the object mapper builder
and two objects with same name are found.

Relates to elastic#86946

Closes elastic#88573
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v8.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants