Skip to content

Lowercase normalizer is used for wildcard queries #28894

@dadoonet

Description

@dadoonet

Elasticsearch version (bin/elasticsearch --version): 6.2.2
Description of the problem including expected versus actual behavior:

Say you index a field Aa as a text field with a Lowercase analyzer.
When you search for aa*, it matches. Searching for Aa* does not match which is normal as the wildcard queries are not analyzed.

Say you index a field Aa as a keyword field with a Lowercase normalizer.
When you search for aa*, it matches. Searching for Aa* matches as well although the wildcard queries are not analyzed.

Steps to reproduce:

DELETE test
PUT test
{
  "settings": {
    "analysis": {
      "normalizer": {
        "lowercase_normalizer": {
          "type": "custom",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "foo": {
          "type": "text",
          "analyzer": "simple", 
          "fields": {
            "keyword": {
              "type": "keyword",
              "normalizer": "lowercase_normalizer"
            }
          }
        }
      }
    }
  }
}
PUT test/doc/1?refresh
{
  "foo": "Bbb Aaa"
}

# Does not match -> OK
GET test/_search
{
  "query": {
    "wildcard": {
      "foo": "Bb*"
    }
  }
}
# Match -> OK
GET test/_search
{
  "query": {
    "wildcard": {
      "foo": "bb*"
    }
  }
}
# Match but should not -> KO
GET test/_search
{
  "query": {
    "wildcard": {
      "foo.keyword": "Bb*"
    }
  }
}
# Match -> OK
GET test/_search
{
  "query": {
    "wildcard": {
      "foo.keyword": "bb*"
    }
  }
}

I spoke with @jpountz who thinks it might be related to https://issues.apache.org/jira/browse/LUCENE-8186

Opening the issue so we can track it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Search Relevance/SearchCatch all for Search Relevance>bugTeam:Search RelevanceMeta label for the Search Relevance team in Elasticsearchpriority:normalA label for assessing bug priority to be used by ES engineers

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions