Fixup highlighting with synthetic source#87667
Merged
nik9000 merged 5 commits intoelastic:masterfrom Jun 15, 2022
Merged
Conversation
Collaborator
|
Pinging @elastic/es-search (Team:Search) |
Collaborator
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
Synthetic source has a habit of reordering text fields. This frustrates highlighting because it *often* wants to use index structures to find the offsets to values in the field. This disables the FVH highlighter for multi-valued text fields when synthetic source is enabled and runs the unified highlighter in "analyze" mode when synthetic source is enabled. That's *enough* to stop them from spitting out wrong answers. We might be leaving some performance on the table when the unified highlighter works on a single valued text field that is indexed with offsets or term vectors. We don't really expect that to be common at all though because *generally* folks will enable synthetic source to save space and adding offsets or term vectors is quite space inefficient. If it comes up, we might be able to improve here.
romseygeek
approved these changes
Jun 15, 2022
Contributor
romseygeek
left a comment
There was a problem hiding this comment.
LGTM, thanks Nik. I think we need one more test to ensure that we cover all paths in FVH but other than that this is good to merge.
| - match: { hits.hits.0.highlight.foo\.vectors.0: <em>the quick brown fox jumped over the lazy dog</em> } | ||
|
|
||
| --- | ||
| text multi fvh: |
Contributor
There was a problem hiding this comment.
Can you add a test that this also occurs when you ask for fragments in order, so that we exercise both FragmentsBuilder implementations?
Member
Author
There was a problem hiding this comment.
Can you add a test that this also occurs when you ask for fragments in order, so that we exercise both FragmentsBuilder implementations?
👍
Collaborator
|
Pinging @elastic/clients-team (Team:Clients) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Synthetic source has a habit of reordering text fields. This frustrates
highlighting because it often wants to use index structures to find
the offsets to values in the field. This disables the FVH highlighter
for multi-valued text fields when synthetic source is enabled and runs
the unified highlighter in "analyze" mode when synthetic source is
enabled. That's enough to stop them from spitting out wrong answers.
We might be leaving some performance on the table when the unified
highlighter works on a single valued text field that is indexed with
offsets or term vectors. We don't really expect that to be common at all
though because generally folks will enable synthetic source to save
space and adding offsets or term vectors is quite space inefficient. If
it comes up, we might be able to improve here.