Skip to content

Expose splitOnWhitespace in Query String Query#20965

Merged
jimczi merged 3 commits intoelastic:masterfrom
jimczi:split_on_whitespace
Nov 2, 2016
Merged

Expose splitOnWhitespace in Query String Query#20965
jimczi merged 3 commits intoelastic:masterfrom
jimczi:split_on_whitespace

Conversation

@jimczi
Copy link
Copy Markdown
Contributor

@jimczi jimczi commented Oct 17, 2016

This change adds an option called split_on_whitespace which prevents the query parser to split free text part on whitespace prior to analysis. Instead the queryparser would parse around only real 'operators'. Default to true.
For instance the query "foo bar" would let the analyzer of the targeted field decide how the tokens should be splitted.
Some options are missing in this change but I'd like to add them in a follow up PR in order to be able to simplify the backport in 5.x. The missing options (changes) are:

  • A type option which similarly to the multi_match query defines how the free text should be parsed when multi fields are defined.
  • Simple range query with additional tokens like ">100 50" are broken when split_on_whitespace is set to false. It should be possible to preserve this syntax and make the parser aware of this special syntax even when split_on_whitespace is set to false.
  • Since all this options would make the query_string_query very similar to a match (multi_match) query we should be able to share the code that produce the final Lucene query.

Fixes #20841

Copy link
Copy Markdown
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I left two comments about changes that will be needed for the backport

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need serialization protection when backported to the 5.x branch

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually there should protection on 6.x too since we have not given up the idea about multi-version clusters yet

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here about the serialization protection

@jimczi
Copy link
Copy Markdown
Contributor Author

jimczi commented Oct 28, 2016

@dakrone @jpountz I pushed another commit to protect the serialization on prior versions. Can you take another look ?

Copy link
Copy Markdown
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. @dakrone might want another look just to be super sure though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably add an else clause that sets splitOnWhitespace to the appropriate value just to be super clear.

Copy link
Copy Markdown
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

This change adds an option called `split_on_whitespace` which prevents the query parser to split free text part on whitespace prior to analysis. Instead the queryparser would parse around only real 'operators'. Default to true.
For instance the query `"foo bar"` would let the analyzer of the targeted field decide how the tokens should be splitted.
Some options are missing in this change but I'd like to add them in a follow up PR in order to be able to simplify the backport in 5.x. The missing options (changes) are:
* A `type` option which similarly to the `multi_match` query defines how the free text should be parsed when multi fields are defined.
* Simple range query with additional tokens like ">100 50" are broken when `split_on_whitespace` is set to false. It should be possible to preserve this syntax and make the parser aware of this special syntax even when `split_on_whitespace` is set to false.
* Since all this options would make the `query_string_query` very similar to a match (multi_match) query we should be able to share the code that produce the final Lucene query.
@jimczi jimczi merged commit 9d6fac8 into elastic:master Nov 2, 2016
@jimczi jimczi deleted the split_on_whitespace branch November 2, 2016 09:00
jimczi added a commit that referenced this pull request Nov 2, 2016
This change adds an option called `split_on_whitespace` which prevents the query parser to split free text part on whitespace prior to analysis. Instead the queryparser would parse around only real 'operators'. Default to true.
For instance the query `"foo bar"` would let the analyzer of the targeted field decide how the tokens should be splitted.
Some options are missing in this change but I'd like to add them in a follow up PR in order to be able to simplify the backport in 5.x. The missing options (changes) are:
* A `type` option which similarly to the `multi_match` query defines how the free text should be parsed when multi fields are defined.
* Simple range query with additional tokens like ">100 50" are broken when `split_on_whitespace` is set to false. It should be possible to preserve this syntax and make the parser aware of this special syntax even when `split_on_whitespace` is set to false.
* Since all this options would make the `query_string_query` very similar to a match (multi_match) query we should be able to share the code that produce the final Lucene query.
@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Search/Search Search-related issues that do not fall into other categories v5.1.1 v6.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants