Propagate MapperBuilderContext across merge calls#86946
Propagate MapperBuilderContext across merge calls#86946javanna merged 4 commits intoelastic:masterfrom
Conversation
With elastic#86166 we have added support for leaf fields that have dots in their names, without needing to expand their path to intermediate objects. That means that for instance a host.name keyword field mapper can be held directly by the root object mapper if the root has subobjects set to false. This opens up for situations where a field name containing dots can't necessarily be split into a leaf name and a parent path made of intermediate object mappers. In the field mapper merge code, we build the full path of the merged field making that now incorrect assumption, which causes the result of merged leaf fields to have incorrect full path. This is a bug although it got unnoticed so far as the full path of a field mapper is not so widely used compared to its leaf name (returned by the simpleName method). One place where the full path is used is when sorting the child mappers in ObjectMapper#serializeMappers which was causing MapperService#assertRefreshIsNotNeeded to trip as the result of the merge has fields in a different order compared to the incoming mappings. With this fix, we add a MapperBuilderContext argument to the different merge methods, propagate the MapperBuilderContext in all the merge calls and create a child context when needed, meaning whenever merging object mappers, so that their children mappers are created with the proper full path.
|
Pinging @elastic/es-search (Team:Search) |
|
|
||
| public abstract ContentPath path(); | ||
|
|
||
| public final MapperBuilderContext createMapperBuilderContext() { |
There was a problem hiding this comment.
This is not perfect, but more isolated now to the only place where it's needed. I am not entirely sure if we can remove it from here hence I removed the TODO.
|
|
||
| @Override | ||
| public final FieldMapper merge(Mapper mergeWith) { | ||
| public final FieldMapper merge(Mapper mergeWith, MapperBuilderContext mapperBuilderContext) { |
There was a problem hiding this comment.
I went for propagating the MapperBuilderContext instead of adding a String path argument, because in some cases we already have a mapper builder context instance that we can just propagate, and in general it seems like a good idea to start merging at the root level with a ROOT mapper builder context and then build a child context for each level, in the caller of the merge methods.
| assertEquals("host.name", textFieldMapper.simpleName()); | ||
| FieldMapper fieldMapper = textFieldMapper.multiFields.iterator().next(); | ||
| assertEquals("foo.metrics.host.name.keyword", fieldMapper.name()); | ||
| assertEquals("keyword", fieldMapper.simpleName()); |
There was a problem hiding this comment.
I am not too sure that these additional merge tests belong to this class. I don't know whether there's an existing place for specific merge tests. Also, these are weird merge tests because they don't really test merging different mapper but rather merge the same mappers and verify the resulting simpleName and name. I can see how that will look weird in a couple of years: "why did we write these tests again?". Thoughts?
There was a problem hiding this comment.
Maybe add a comment to the effect that we're testing that leaf fields with dots in their names merge correctly?
There was a problem hiding this comment.
Really I think we should rewrite all these tests so that they extend MapperServiceTestCase and test the actual merge paths that MapperService calls, but let's do that as a followup.
There was a problem hiding this comment.
interesting, do they now test a different codepath? I mean they call root.merge directly, instead of going through mapper service, is that what you mean?
There was a problem hiding this comment.
I mean they call root.merge directly, instead of going through mapper service, is that what you mean?
Exactly
romseygeek
left a comment
There was a problem hiding this comment.
LGTM. If we ever manage to move merging onto the Builders then I think we can remove a lot of this again, but for now we're stuck with it.
| assertEquals("host.name", textFieldMapper.simpleName()); | ||
| FieldMapper fieldMapper = textFieldMapper.multiFields.iterator().next(); | ||
| assertEquals("foo.metrics.host.name.keyword", fieldMapper.name()); | ||
| assertEquals("keyword", fieldMapper.simpleName()); |
There was a problem hiding this comment.
Maybe add a comment to the effect that we're testing that leaf fields with dots in their names merge correctly?
| assertEquals("host.name", textFieldMapper.simpleName()); | ||
| FieldMapper fieldMapper = textFieldMapper.multiFields.iterator().next(); | ||
| assertEquals("foo.metrics.host.name.keyword", fieldMapper.name()); | ||
| assertEquals("keyword", fieldMapper.simpleName()); |
There was a problem hiding this comment.
Really I think we should rewrite all these tests so that they extend MapperServiceTestCase and test the actual merge paths that MapperService calls, but let's do that as a followup.
…c#89389) When parsing mappings, we may find a field with same name specified twice, either because JSON duplicate keys are allowed, or because a mix of object notation and dot notation is used when providing mappings. The same can happen when applying dynamic mappings as part of parsing an incoming document, as well as when merging separate index templates that may contain the definition for the same field using a mix of object notation and dot notation. While we propagate the MapperBuilderContext across merge calls thanks to elastic#86946, we do not propagate the right context when we call merge on objects as part of parsing/building mappings. This causes a situation in which the leaf fields that result from the merge have the wrong path, which misses the first portion e.g. sub.field instead of obj.sub.field. This commit applies the correct mapper builder context when building the object mapper builder and two objects with same name are found. Relates to elastic#86946 Closes elastic#88573
…c#89389) When parsing mappings, we may find a field with same name specified twice, either because JSON duplicate keys are allowed, or because a mix of object notation and dot notation is used when providing mappings. The same can happen when applying dynamic mappings as part of parsing an incoming document, as well as when merging separate index templates that may contain the definition for the same field using a mix of object notation and dot notation. While we propagate the MapperBuilderContext across merge calls thanks to elastic#86946, we do not propagate the right context when we call merge on objects as part of parsing/building mappings. This causes a situation in which the leaf fields that result from the merge have the wrong path, which misses the first portion e.g. sub.field instead of obj.sub.field. This commit applies the correct mapper builder context when building the object mapper builder and two objects with same name are found. Relates to elastic#86946 Closes elastic#88573
…c#89389) When parsing mappings, we may find a field with same name specified twice, either because JSON duplicate keys are allowed, or because a mix of object notation and dot notation is used when providing mappings. The same can happen when applying dynamic mappings as part of parsing an incoming document, as well as when merging separate index templates that may contain the definition for the same field using a mix of object notation and dot notation. While we propagate the MapperBuilderContext across merge calls thanks to elastic#86946, we do not propagate the right context when we call merge on objects as part of parsing/building mappings. This causes a situation in which the leaf fields that result from the merge have the wrong path, which misses the first portion e.g. sub.field instead of obj.sub.field. This commit applies the correct mapper builder context when building the object mapper builder and two objects with same name are found. Relates to elastic#86946 Closes elastic#88573
With #86166 we have added support for leaf fields that have dots in their names, without needing to expand their path to intermediate objects. That means that for instance a host.name keyword field mapper can be held directly by the root object mapper if the root has subobjects set to false.
This opens up for situations where a field name containing dots can't necessarily be split into a leaf name and a parent path made of intermediate object mappers. In the field mapper merge code, we build the full path of the merged field making that now incorrect assumption, which causes the result of merged leaf fields to have incorrect full path. This is a bug although it got unnoticed so far as the full path of a field mapper is not so widely used compared to its leaf name (returned by the simpleName method). One place where the full path is used is when sorting the child mappers in ObjectMapper#serializeMappers which was causing MapperService#assertRefreshIsNotNeeded to trip as the result of the merge has fields in a different order compared to the incoming mappings, due to the different name between the original mapper and the merged mapper.
With this fix, we add a MapperBuilderContext argument to the different merge methods, propagate the MapperBuilderContext in all the merge calls and create a child context when needed, meaning whenever merging object mappers, so that their children mappers are created with the proper full path.