Mask wildcard query special characters on keyword queries by cbuescher · Pull Request #53127 · elastic/elasticsearch

cbuescher · 2020-03-04T17:21:55Z

Wildcard queries on keyword fields should get normalized, however this normalization
should exclude the two special characters * and ? in order to keep the wildcard query
itself intact.

Closes #46300

Wildcard queries on keyword fields get normalized, however this normalization step should exclude the two special characters * and ? in order to keep the wildcard query itself intact. Closes elastic#46300

elasticmachine · 2020-03-04T17:21:57Z

Pinging @elastic/es-search (:Search/Analysis)

server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java

cbuescher · 2020-03-05T18:00:44Z

@jimczi pushed some changes moving the code like you suggested

cbuescher · 2020-03-10T14:27:34Z

@elasticmachine update branch

elasticmachine · 2020-03-10T14:27:35Z

merge conflict between base and head

jimczi

The StringFieldType looks good to me.
I am less sure about the modifications in the _type field. I left a comment on how we can address it but that can be done in a follow up if you prefer.

server/src/main/java/org/elasticsearch/index/mapper/TypeFieldMapper.java

cbuescher · 2020-03-11T14:25:35Z

@jimczi @romseygeek thanks for the review, I added a commit that changes TypeFieldType to extend ConstantFieldType like you suggested and adapted tests where necessary. Hope this looks okay now.
For the backport of this to 7.x/7.6., if I'm not mistaken we still need to support more than one "_type" value, so cannot use ConstantFieldType here. It looks like wildcard queries are forwarded to termsQuery(via StringFieldType#wildcardQuery and TypeFieldType#termQuery) which only performs exact term matching returning either MatchAllDocs or MatchNoDocs. I think we need to continue that based on the "_type" field on 7.x since we are sure to have only one type, but it might be something other than "_doc" (from e.g. an old 6.x index)? Am I more or less right about this? We can discuss this on the backport if you are okay with this change on master for now...

romseygeek · 2020-03-11T14:34:05Z

For the backport of this to 7.x/7.6., if I'm not mistaken we still need to support more than one "_type" value, so cannot use ConstantFieldType here

We can have only one type in 7x, but it may be something other than _doc - you should be able to get it from MapperService#type() though

server/src/main/java/org/elasticsearch/index/mapper/TypeFieldMapper.java

server/src/main/java/org/elasticsearch/index/mapper/ConstantFieldType.java

jimczi

The change looks good to me. I left one comment regarding the handling of prefix and wildcard queries for _type field. Feel free to address it without further review.

jimczi · 2020-03-12T11:55:32Z

server/src/main/java/org/elasticsearch/index/mapper/TypeFieldMapper.java

-                        result = new MatchNoDocsQuery("[_type] was lexicographically greater than upper bound of range");
-                    }
-                }
+        protected boolean matches(String pattern, QueryShardContext context) {


I don't think we need to handle wildcard and prefixes here ? We don't support prefix and wildcard queries on the _type field today and since _type are now a thing from the past I don't think we should add this ability. Just checking that the pattern exactly matches the internal type should be enough.

cbuescher · 2020-03-12T15:29:58Z

@elasticmachine run elasticsearch-ci/2
@elasticmachine run elasticsearch-ci/bwc
@elasticmachine run elasticsearch-ci/default-distro

cbuescher · 2020-03-12T16:25:23Z

@elasticmachine update branch

) Wildcard queries on keyword fields get normalized, however this normalization step should exclude the two special characters * and ? in order to keep the wildcard query itself intact. Closes elastic#46300

jimczi · 2020-03-18T12:35:26Z

@cbuescher I removed the v7.6.2 since this pr doesn't need to be backported in 7.6. The backport is still missing in 7.7 but it can be tracked here.

…53512) Wildcard queries on keyword fields get normalized, however this normalization step should exclude the two special characters * and ? in order to keep the wildcard query itself intact. Closes #46300

Wildcard queries on text fields should not apply the fields analyzer to the search query. However, we accidentally enabled this in elastic#53127 by moving the query normalization to the StringFieldType super type. This change fixes this by separating the notion of normalization and case insensitivity (as implemented in the `case_insensitive` flag). This is done because we still need to maintain normalization of the query sting when the wildcard query method on the field type is requested from the `query_string` query parser. Wildcard queries on keyword fields should also continue to apply the fields normalizer, regardless of whether the `case_insensitive` is set, because normalization could involve something else than lowercasing (e.g. substituting umlauts like in the GermanNormalizationFilter). Closes elastic#71403

Wildcard queries on text fields should not apply the fields analyzer to the search query. However, we accidentally enabled this in #53127 by moving the query normalization to the StringFieldType super type. This change fixes this by separating the notion of normalization and case insensitivity (as implemented in the `case_insensitive` flag). This is done because we still need to maintain normalization of the query sting when the wildcard query method on the field type is requested from the `query_string` query parser. Wildcard queries on keyword fields should also continue to apply the fields normalizer, regardless of whether the `case_insensitive` is set, because normalization could involve something else than lowercasing (e.g. substituting umlauts like in the GermanNormalizationFilter). Closes #71403

…ic#71751) Wildcard queries on text fields should not apply the fields analyzer to the search query. However, we accidentally enabled this in elastic#53127 by moving the query normalization to the StringFieldType super type. This change fixes this by separating the notion of normalization and case insensitivity (as implemented in the `case_insensitive` flag). This is done because we still need to maintain normalization of the query sting when the wildcard query method on the field type is requested from the `query_string` query parser. Wildcard queries on keyword fields should also continue to apply the fields normalizer, regardless of whether the `case_insensitive` is set, because normalization could involve something else than lowercasing (e.g. substituting umlauts like in the GermanNormalizationFilter). Closes elastic#71403

… (#72214) Wildcard queries on text fields should not apply the fields analyzer to the search query. However, we accidentally enabled this in #53127 by moving the query normalization to the StringFieldType super type. This change fixes this by separating the notion of normalization and case insensitivity (as implemented in the `case_insensitive` flag). This is done because we still need to maintain normalization of the query sting when the wildcard query method on the field type is requested from the `query_string` query parser. Wildcard queries on keyword fields should also continue to apply the fields normalizer, regardless of whether the `case_insensitive` is set, because normalization could involve something else than lowercasing (e.g. substituting umlauts like in the GermanNormalizationFilter). Closes #71403

… (#72216) Wildcard queries on text fields should not apply the fields analyzer to the search query. However, we accidentally enabled this in #53127 by moving the query normalization to the StringFieldType super type. This change fixes this by separating the notion of normalization and case insensitivity (as implemented in the `case_insensitive` flag). This is done because we still need to maintain normalization of the query sting when the wildcard query method on the field type is requested from the `query_string` query parser. Wildcard queries on keyword fields should also continue to apply the fields normalizer, regardless of whether the `case_insensitive` is set, because normalization could involve something else than lowercasing (e.g. substituting umlauts like in the GermanNormalizationFilter). Closes #71403

…ic#71751) (elastic#72214) Wildcard queries on text fields should not apply the fields analyzer to the search query. However, we accidentally enabled this in elastic#53127 by moving the query normalization to the StringFieldType super type. This change fixes this by separating the notion of normalization and case insensitivity (as implemented in the `case_insensitive` flag). This is done because we still need to maintain normalization of the query sting when the wildcard query method on the field type is requested from the `query_string` query parser. Wildcard queries on keyword fields should also continue to apply the fields normalizer, regardless of whether the `case_insensitive` is set, because normalization could involve something else than lowercasing (e.g. substituting umlauts like in the GermanNormalizationFilter). Closes elastic#71403

… (#72224) Wildcard queries on text fields should not apply the fields analyzer to the search query. However, we accidentally enabled this in #53127 by moving the query normalization to the StringFieldType super type. This change fixes this by separating the notion of normalization and case insensitivity (as implemented in the `case_insensitive` flag). This is done because we still need to maintain normalization of the query sting when the wildcard query method on the field type is requested from the `query_string` query parser. Wildcard queries on keyword fields should also continue to apply the fields normalizer, regardless of whether the `case_insensitive` is set, because normalization could involve something else than lowercasing (e.g. substituting umlauts like in the GermanNormalizationFilter). Closes #71403

Mask wildcard query special characters on keyword queries

ec7ecf7

Wildcard queries on keyword fields get normalized, however this normalization step should exclude the two special characters * and ? in order to keep the wildcard query itself intact. Closes elastic#46300

cbuescher added >bug :Search Relevance/Analysis How text is split into tokens v8.0.0 v7.7.0 labels Mar 4, 2020

jimczi reviewed Mar 4, 2020

View reviewed changes

server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java Outdated Show resolved Hide resolved

Christoph Büscher added 2 commits March 5, 2020 18:54

Moving masking code to StringFieldType#wildcardQuery

1e3fc34

Merge branch 'master' into fix-46300

8cf6a71

iter

f0da91d

jimczi mentioned this pull request Mar 6, 2020

Wildcard field optimised for wildcard queries #49993

Merged

Merge branch 'master' into fix-46300

c4529d1

jimczi reviewed Mar 11, 2020

View reviewed changes

server/src/main/java/org/elasticsearch/index/mapper/TypeFieldMapper.java Outdated Show resolved Hide resolved

Christoph Büscher added 2 commits March 11, 2020 14:47

Change TypeFieldType to extend ConstantFieldType

e727b91

Merge branch 'master' into fix-46300

a59ea3e

iter

ddf828c

romseygeek reviewed Mar 11, 2020

View reviewed changes

server/src/main/java/org/elasticsearch/index/mapper/TypeFieldMapper.java Outdated Show resolved Hide resolved

server/src/main/java/org/elasticsearch/index/mapper/ConstantFieldType.java Outdated Show resolved Hide resolved

Christoph Büscher added 2 commits March 11, 2020 16:02

iter

c555c43

Fix mocks in test

db5ea54

jimczi approved these changes Mar 12, 2020

View reviewed changes

TypeFieldType#matches only needs to support exact match

4dfe63d

Merge branch 'master' into fix-46300

e466513

cbuescher merged commit facd525 into elastic:master Mar 12, 2020

cbuescher added backport pending v7.6.2 labels Mar 12, 2020

cbuescher mentioned this pull request Mar 12, 2020

Mask wildcard query special characters on keyword queries (#53127) #53512

Merged

jimczi removed the v7.6.2 label Mar 18, 2020

cbuescher removed the backport pending label Mar 24, 2020

codebrain mentioned this pull request Apr 1, 2020

7.7.0 meta ticket (Part 3) elastic/elasticsearch-net#4534

Closed

This was referenced Apr 2, 2020

[CI] Test failure on 7.x branch: AbstractScalaEsScalaSparkSQL testDataSourcePushDown12And elastic/elasticsearch-hadoop#1456

Closed

Update Spark tests that use wildcard query elastic/elasticsearch-hadoop#1458

Merged

markharwood mentioned this pull request Apr 7, 2021

Wildcard queries on text fields don't obey rules for case sensitivity #71403

Closed

cbuescher mentioned this pull request Apr 15, 2021

Fix case sensitivity rules for wildcard queries on text fields #71751

Merged

cbuescher mentioned this pull request Apr 26, 2021

Fix case sensitivity rules for wildcard queries on text fields (#71751) #72214

Merged

cbuescher mentioned this pull request Apr 26, 2021

Fix case sensitivity rules for wildcard queries on text fields (#71751) #72216

Merged

cbuescher mentioned this pull request Apr 26, 2021

Fix case sensitivity rules for wildcard queries on text fields (#71751) #72224

Merged

cbuescher mentioned this pull request Jun 16, 2021

Inconsistent wildcard search results when combined with a trim normalizer #73138

Open

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

yrodiere mentioned this pull request May 18, 2022

Updates to docker-maven-plugin configuration so M1 mac build runs (almost) clean with -Dstart-containers quarkusio/quarkus#25648

Closed

romseygeek mentioned this pull request Jun 16, 2022

Nested query using wildcard filter with custom analyzer : query failed on 7.13.0 #87728

Closed

Conversation

cbuescher commented Mar 4, 2020

Uh oh!

elasticmachine commented Mar 4, 2020

Uh oh!

Uh oh!

cbuescher commented Mar 5, 2020

Uh oh!

cbuescher commented Mar 10, 2020

Uh oh!

elasticmachine commented Mar 10, 2020

Uh oh!

jimczi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cbuescher commented Mar 11, 2020

Uh oh!

romseygeek commented Mar 11, 2020

Uh oh!

Uh oh!

Uh oh!

jimczi left a comment

Choose a reason for hiding this comment

Uh oh!

jimczi Mar 12, 2020

Choose a reason for hiding this comment

Uh oh!

cbuescher commented Mar 12, 2020

Uh oh!

cbuescher commented Mar 12, 2020

Uh oh!

jimczi commented Mar 18, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants