Synthetic _source: support match_only_text#89516
Conversation
This adds support for synthetic `_source` to the `match_only_text` field type. When synthetic `_source` is enabled `match_only_text` fields create a hidden stored field to contain their text. This should have similar or better search performance for this specific field type, though it will have slightly worse indexing performance because synthetic `_source` is still writing `_recovery_source`, which means we're writing the bits for this field twice.
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
|
Hi @nik9000, I've created a changelog YAML for you. |
|
Pinging @elastic/es-search (Team:Search) |
| } catch (IOException e) { | ||
| throw new UncheckedIOException(e); | ||
| } | ||
| }; |
There was a problem hiding this comment.
There's a stored field lookup thing in searchExecutionContext.lookup() but it can't be convinced to load the hidden stored field. If we feel strongly about it I can try and integrate into it, but I'm not super sure how at the moment.
There was a problem hiding this comment.
Yeah that's strictly for scripts and is integrated poorly with other stored field lookup stuff, so I don't think it's worth trying to re-use it for the moment. I do think that the document lookup API I'm playing with at the moment will improve this though.
There was a problem hiding this comment.
It has a lovely caching mechanism that I think could be quite nice. If multiple queries need to recheck the source it'll load once while this won't.
There was a problem hiding this comment.
So, somewhat annoyingly, I don't think we will be able to re-use a stored field loader from the SearchLookup here because the underlying API expects a LeafReaderContext, not a LeafSearchLookup. But you should be able to use a LeafStoredFieldLoader rather than a FieldVisitor here which will at least be more readable.
romseygeek
left a comment
There was a problem hiding this comment.
This looks great! I think it may be worth waiting for a few days to see if I can get document loaders working though as it will tidy up the impl a fair amount.
| } catch (IOException e) { | ||
| throw new UncheckedIOException(e); | ||
| } | ||
| }; |
There was a problem hiding this comment.
Yeah that's strictly for scripts and is integrated poorly with other stored field lookup stuff, so I don't think it's worth trying to re-use it for the moment. I do think that the document lookup API I'm playing with at the moment will improve this though.
| context.addToFieldNames(fieldType().name()); | ||
|
|
||
| if (context.isSyntheticSource()) { | ||
| context.doc().add(new Field(fieldType().originalFieldName(), value, ORIGINAL_FIELD_TYPE)); |
There was a problem hiding this comment.
Use StoredField rather than creating a new field type
| throw new IllegalArgumentException(CONTENT_TYPE + " fields do not support sorting and aggregations"); | ||
| } | ||
|
|
||
| private String originalFieldName() { |
There was a problem hiding this comment.
Maybe call this storedFieldName to make it a bit clearer what it's used for?
| } | ||
| return fieldMapping(b -> b.field("type", "match_only_text")); | ||
| } | ||
|
|
There was a problem hiding this comment.
I think I'd be happier with coverage here if we explicitly run both tests for the synthetic and non-synthetic case, this feels at the moment like we're only testing 50% of the functionality on each run and I think that will come back to bite us.
👍 |
Adds more tests for the enrich processor around different index types. Right now they all work fine (yay!) but this feels like a good amount of paranoia.
|
@romseygeek this should be ready for you too! |
|
I almost clicked the merge button! It wouldn't have compiled! sneaky sneaky. |
This adds support for synthetic
_sourceto thematch_only_textfieldtype. When synthetic
_sourceis enabledmatch_only_textfieldscreate a hidden stored field to contain their text. This should have
similar or better search performance for this specific field type,
though it will have slightly worse indexing performance because
synthetic
_sourceis still writing_recovery_source, which meanswe're writing the bits for this field twice.