Remove 'external values', and replace with swapped out XContentParsers#72203
Remove 'external values', and replace with swapped out XContentParsers#72203romseygeek merged 6 commits intoelastic:masterfrom
Conversation
|
Pinging @elastic/es-search (Team:Search) |
|
This doesn't remove the special handling in GeoPointFieldMapper and CompletionFieldMapper, but it preserves that behaviour while cleaning up all the other field mappers so I think it's worth it as a first step. We can look at better ways of dealing with multiple completion setups separately. |
romseygeek
left a comment
There was a problem hiding this comment.
As a followup I think it would be a good idea to always pass a wrapped XContentParser to multifields.parse() that throws an exception on nextToken(), to ensure that subfields don't inadvertently advance the parent parser.
| public void parse(ParseContext context) throws IOException { | ||
| if (context.externalValueSet()) { | ||
| throw new IllegalArgumentException("[rank_features] fields can't be used in multi-fields"); | ||
| } |
There was a problem hiding this comment.
I've moved these checks (where they existed) to the TypeParser so they are now caught at mapping time, instead of when you try and index your first document. There are probably more places where we could do this, but for now I've limited it to just those mappers that were already checking. I've added tests for them as well.
There was a problem hiding this comment.
This is a nice clean-up!
|
Thanks @romseygeek for tackling #56063, it is an important simplification.
For me an important goal of #56063 is to actually remove support for this entirely, whereas this PR proposes to still support it (but in a more solid/ elegant way). The problem is that this flexibility makes it hard to understand what data ended up getting indexed for a field, since a parent field is allowed to supply literally anything as an 'external value'. In fact the When the analytics+geo team discussed the issue, I think there was support for removing the ability for geopoints to pass geohashes to their subfields. And in our discussion with @jimczi he clarified that we don't have a strong use case for completion multi-fields (#34081) -- we added it preemptively, not in response to user feedback. So since all current uses of external values aren't necessary, we could just entirely remove the ability to set special data to be passed to a multi-field. |
I agree! But I don't think we can remove it in one shot, we need to keep supporting this for indexes created in 7x and forbid new indexes created using it, etc. So I view this PR as a first step, clearing up the general case and narrowing down the specific exceptions to only affect the two classes in question. We can follow up by preventing multifields on geo_point fields for new indexes. Removing it from completion field mappers might be a bit more complicated, but that's why I think this PR is worth doing to begin with, as it lets us consider that in isolation from the broader cleanup. |
|
It sounds like we're on the same page. I was just confused because I saw "Closes #56063" which will close the issue automatically. This refactor is really nice in that it removes the need for each field mapper to check external values. Only the parent mappers geopoint and completion need to consider special multi-field parsing. Is this the rough plan? Maybe we could add a list like this to #56063 to clarify.
|
| public void parse(ParseContext context) throws IOException { | ||
| if (context.externalValueSet()) { | ||
| throw new IllegalArgumentException("[rank_features] fields can't be used in multi-fields"); | ||
| } |
There was a problem hiding this comment.
This is a nice clean-up!
| contextSuggestField("timmy"), | ||
| contextSuggestField("starbucks") | ||
| )); | ||
| try (TokenStream ts = indexableFields.getFields("field.subsuggest")[0].tokenStream(Lucene.WHITESPACE_ANALYZER, null)) { |
There was a problem hiding this comment.
Could you explain what this tests?
There was a problem hiding this comment.
In my initial implementation contexts weren't getting transferred correctly to subfields, and iterating over the field tokenstream would throw an exception. This was picked up by rest tests, but we didn't have a unit test for it. As it's not easy to check that the field has been correctly built via introspection, I used this as a way to trigger the bug in a unit test so I could track it down more easily. I'll add a comment, I think it's probably worth keeping here just because it fails faster than a rest test.
| }; | ||
| } | ||
|
|
||
| public final ParseContext switchParser(XContentParser parser) { |
There was a problem hiding this comment.
I think we should mark this as deprecated, with a note we are actively deprecating + removing usages of it. That will prevent us from adding more usages accidentally. If there are any plugin authors using external values (which I hope there aren't), this will discourage them from just switching onto this method, which we hope to eventually remove.
|
|
||
| private static class CompletionParser extends FilterXContentParser { | ||
|
|
||
| boolean advanced = false; |
There was a problem hiding this comment.
This is trickier than I realized! I really hope we can eventually remove it.
There was a problem hiding this comment.
Small comment, I think we usually style it like @deprecated <explanation text>
elastic#72203) The majority of field mappers read a single value from their positioned XContentParser, and do not need to call nextToken. There is a general assumption that the same holds for any multifields defined on them, and so the XContentParser is passed down to their multifields builder as-is. This assumption does not hold for mappers that accept json objects, and so we have a second mechanism for passing values around called 'external values', where a mapper can set a specific value on its context and child mappers can then check for these external values before reading from xcontent. The disadvantage of this is that every field mapper now needs to check its context for external values. Because the values are defined by their java class, we can also know that in the vast majority of cases this functionality is unused. We have only two mappers that actually make use of this, CompletionFieldMapper and GeoPointFieldMapper. This commit removes external values entirely, and replaces it with the ability to pass a modified XContentParser to multifields. FieldMappers can just check the parser attached to their context for data and don't need to worry about multiple sources. Plugins implementing field mappers will need to take the removal of external values into account. Implementations that are passing structured objects as external values should instead use ParseContext.switchParser and wrap the objects using MapXContentParser.wrapObject(). GeoPointFieldMapper passes on a fake parser that just wraps its input data formatted as a geohash; CompletionFieldMapper has a slightly more complicated parser that in general wraps its metadata, but if textOrNull() is called without the parser being advanced just returns its text input. Relates to elastic#56063
#72203) (#72448) The majority of field mappers read a single value from their positioned XContentParser, and do not need to call nextToken. There is a general assumption that the same holds for any multifields defined on them, and so the XContentParser is passed down to their multifields builder as-is. This assumption does not hold for mappers that accept json objects, and so we have a second mechanism for passing values around called 'external values', where a mapper can set a specific value on its context and child mappers can then check for these external values before reading from xcontent. The disadvantage of this is that every field mapper now needs to check its context for external values. Because the values are defined by their java class, we can also know that in the vast majority of cases this functionality is unused. We have only two mappers that actually make use of this, CompletionFieldMapper and GeoPointFieldMapper. This commit removes external values entirely, and replaces it with the ability to pass a modified XContentParser to multifields. FieldMappers can just check the parser attached to their context for data and don't need to worry about multiple sources. Plugins implementing field mappers will need to take the removal of external values into account. Implementations that are passing structured objects as external values should instead use ParseContext.switchParser and wrap the objects using MapXContentParser.wrapObject(). GeoPointFieldMapper passes on a fake parser that just wraps its input data formatted as a geohash; CompletionFieldMapper has a slightly more complicated parser that in general wraps its metadata, but if textOrNull() is called without the parser being advanced just returns its text input. Relates to #56063
The majority of field mappers read a single value from their positioned
XContentParser, and do not need to call
nextToken. There is a generalassumption that the same holds for any multifields defined on them, and
so the XContentParser is passed down to their multifields builder as-is.
This assumption does not hold for mappers that accept json objects,
and so we have a second mechanism for passing values around called
'external values', where a mapper can set a specific value on its context
and child mappers can then check for these external values before reading
from xcontent. The disadvantage of this is that every field mapper now
needs to check its context for external values. Because the values are
defined by their java class, we can also know that in the vast majority of
cases this functionality is unused. We have only two mappers that actually
make use of this, CompletionFieldMapper and GeoPointFieldMapper.
This commit removes external values entirely, and replaces it with the ability
to pass a modified XContentParser to multifields. FieldMappers can just check
the parser attached to their context for data and don't need to worry about
multiple sources.
Plugins implementing field mappers will need to take the removal of external
values into account. Implementations that are passing structured objects
as external values should instead use
ParseContext.switchParserandwrap the objects using
MapXContentParser.wrapObject().GeoPointFieldMapper passes on a fake parser that just wraps its input data
formatted as a geohash; CompletionFieldMapper has a slightly more complicated
parser that in general wraps its metadata, but if
textOrNull()is called withoutthe parser being advanced just returns its text input.
Relates to #56063