Expose splitOnWhitespace in Query String Query#20965
Merged
jimczi merged 3 commits intoelastic:masterfrom Nov 2, 2016
jimczi:split_on_whitespace
Merged
Expose splitOnWhitespace in Query String Query#20965jimczi merged 3 commits intoelastic:masterfrom jimczi:split_on_whitespace
Query String Query#20965jimczi merged 3 commits intoelastic:masterfrom
jimczi:split_on_whitespace
Conversation
dakrone
approved these changes
Oct 17, 2016
Member
dakrone
left a comment
There was a problem hiding this comment.
LGTM, I left two comments about changes that will be needed for the backport
Member
There was a problem hiding this comment.
This will need serialization protection when backported to the 5.x branch
Contributor
There was a problem hiding this comment.
actually there should protection on 6.x too since we have not given up the idea about multi-version clusters yet
Member
There was a problem hiding this comment.
Same here about the serialization protection
Contributor
Author
nik9000
approved these changes
Oct 30, 2016
Member
There was a problem hiding this comment.
I'd probably add an else clause that sets splitOnWhitespace to the appropriate value just to be super clear.
This change adds an option called `split_on_whitespace` which prevents the query parser to split free text part on whitespace prior to analysis. Instead the queryparser would parse around only real 'operators'. Default to true. For instance the query `"foo bar"` would let the analyzer of the targeted field decide how the tokens should be splitted. Some options are missing in this change but I'd like to add them in a follow up PR in order to be able to simplify the backport in 5.x. The missing options (changes) are: * A `type` option which similarly to the `multi_match` query defines how the free text should be parsed when multi fields are defined. * Simple range query with additional tokens like ">100 50" are broken when `split_on_whitespace` is set to false. It should be possible to preserve this syntax and make the parser aware of this special syntax even when `split_on_whitespace` is set to false. * Since all this options would make the `query_string_query` very similar to a match (multi_match) query we should be able to share the code that produce the final Lucene query.
jimczi
added a commit
that referenced
this pull request
Nov 2, 2016
This change adds an option called `split_on_whitespace` which prevents the query parser to split free text part on whitespace prior to analysis. Instead the queryparser would parse around only real 'operators'. Default to true. For instance the query `"foo bar"` would let the analyzer of the targeted field decide how the tokens should be splitted. Some options are missing in this change but I'd like to add them in a follow up PR in order to be able to simplify the backport in 5.x. The missing options (changes) are: * A `type` option which similarly to the `multi_match` query defines how the free text should be parsed when multi fields are defined. * Simple range query with additional tokens like ">100 50" are broken when `split_on_whitespace` is set to false. It should be possible to preserve this syntax and make the parser aware of this special syntax even when `split_on_whitespace` is set to false. * Since all this options would make the `query_string_query` very similar to a match (multi_match) query we should be able to share the code that produce the final Lucene query.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change adds an option called
split_on_whitespacewhich prevents the query parser to split free text part on whitespace prior to analysis. Instead the queryparser would parse around only real 'operators'. Default to true.For instance the query
"foo bar"would let the analyzer of the targeted field decide how the tokens should be splitted.Some options are missing in this change but I'd like to add them in a follow up PR in order to be able to simplify the backport in 5.x. The missing options (changes) are:
typeoption which similarly to themulti_matchquery defines how the free text should be parsed when multi fields are defined.split_on_whitespaceis set to false. It should be possible to preserve this syntax and make the parser aware of this special syntax even whensplit_on_whitespaceis set to false.query_string_queryvery similar to a match (multi_match) query we should be able to share the code that produce the final Lucene query.Fixes #20841