Synthetic _source: support field in many cases#89950
Synthetic _source: support field in many cases#89950elasticsearchmachine merged 17 commits intoelastic:mainfrom
field in many cases#89950Conversation
This adds support for the `field` scripting API in many but not all cases. Before this change numbers, dates, and IPs supported the `field` API when running with _source in synthetic mode because they always have doc values. This change adds support for `match_only_text`, `store`d `keyword` fields, and `store`d `text` fields. Two remaining field configurations work with synthetic _source and do not work with `field`: * A `text` field with a sub-`keyword` field that has `doc_values` * A `text` field with a sub-`keyword` field that is `store`d
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
|
Hi @nik9000, I've created a changelog YAML for you. |
| public boolean alwaysEmpty() { | ||
| return sourceProvider.alwaysEmpty(); | ||
| } | ||
|
|
There was a problem hiding this comment.
Can we do this via the FieldDataContext somehow? Putting it on the SourceLookup feels a bit weird to me.
There was a problem hiding this comment.
I can move it, but, like, it's the source lookup that is empty. I think maybe moving and renaming it is a good idea. I just don't know the right spot.
| throw new IllegalArgumentException(CONTENT_TYPE + " fields do not support sorting and aggregations"); | ||
| } | ||
| SourceLookup sourceLookup = fieldDataContext.lookupSupplier().get().source(); | ||
| if (sourceLookup.alwaysEmpty()) { |
There was a problem hiding this comment.
I've been using alwaysEmpty to signal that you should try synthetic source things. But maybe it'd be better to have it actually have a syntheticSource method instead. Always empty is a fairly broad concept.
There was a problem hiding this comment.
I think we could have source providers have a syntheticSource method instead - the only non-test place I believe we explicitly use the "null" source provider is from MappingLookup which checks if we have synthetic source.
|
Pinging @elastic/es-search (Team:Search) |
|
@romseygeek could you have another look? It's much bigger now.... |
… synthetic_source_fields_3
server/src/main/java/org/elasticsearch/index/mapper/MappedFieldType.java
Outdated
Show resolved
Hide resolved
nik9000
left a comment
There was a problem hiding this comment.
@romseygeek this is ready for you again.
|
run |
|
run elasticsearch-ci/part-1 |
martijnvg
left a comment
There was a problem hiding this comment.
I'm happy with this change 👍 . I didn't do a line by line review, but how it works looks good to me.
|
So I know that I asked for the information about synthetic source to be moved to FieldDataContext but I have, annoyingly, changed my mind about that again. FDC holds information at the context in which field data is being asked for, and synthetic source is entirely orthogonal to that. I've opened #91400 to move this to MapperBuilderContext, which I think will fit better here as well - field types can get a constructor parameter telling them if they need to load things from a secret stored field or just use source. |
romseygeek
left a comment
There was a problem hiding this comment.
Nearly there, I left a couple of questions.
| if (hasDocValues()) { | ||
| return fieldDataFromDocValues(); | ||
| } | ||
| if (isSyntheticSource && isStored()) { |
There was a problem hiding this comment.
What happens if synthetic source is enabled and the data isn't stored? As I read it we fall through to using normal source, which will fail a bit unpredictably I think? Is it worth throwing an explicit Exception here in that case?
There was a problem hiding this comment.
It isn't possible to configure synthetic _source without doc values or stored fields. I'll add a hard test.
There was a problem hiding this comment.
Maybe add an assertion as well so its more obvious when reading the code?
| if (operation != FielddataOperation.SCRIPT) { | ||
| throw new IllegalStateException("unknown field data operation [" + operation.name() + "]"); | ||
| } | ||
| if (isSyntheticSource && isStored()) { |
There was a problem hiding this comment.
This one has a different answer! This will be possible in a follow up but isn't yet. I'll throw a helpful exception here too.
There was a problem hiding this comment.
This one has a different answer! This will be possible in a follow up but isn't yet. I'll throw a helpful exception here too.
There was a problem hiding this comment.
This one has a different answer! This will be possible in a follow up but isn't yet. I'll throw a helpful exception here too.
|
|
||
| public void testDocValues() throws IOException { | ||
| MapperService mapper = createMapperService(fieldMapping(b -> b.field("type", "text"))); | ||
| assertScriptDocValues(mapper, "foo", equalTo(List.of("foo"))); |
There was a problem hiding this comment.
I think it's worth testing that an input that would otherwise be tokenized is returned whole here. Basically test that we get "foo bar" back, rather than a list of "foo" and "bar".
| assertScriptDocValues(mapper, "foo", equalTo(List.of("foo"))); | ||
| } | ||
|
|
||
| @AwaitsFix(bugUrl = "https://github.com/elastic/elasticsearch/issues/86603") |
There was a problem hiding this comment.
This links to the master synthetic source issue, is there a more specific one that this relates to?
There was a problem hiding this comment.
I'll un-awaitsfix it and have it catch the fancy new exception I'm throwing.
romseygeek
left a comment
There was a problem hiding this comment.
LGTM! Thanks for iterating.
This adds support for the
fieldscripting API in many but not all cases. Before this change numbers, dates, and IPs supported thefieldAPI when running with _source in synthetic mode because they always have doc values. This change adds support formatch_only_text,storedkeywordfields, andstoredtextfields. Two remaining field configurations work with synthetic _source and do not work withfield:textfield with a sub-keywordfield that hasdoc_valuestextfield with a sub-keywordfield that isstored