Skip to content

MatchNoDocsQuery from stop words with wildcards in query_string #28856

@rezecib

Description

@rezecib

Elasticsearch version (bin/elasticsearch --version): 6.2.2, Build: 10b1edd/2018-02-16T19:01:30.685723Z

Plugins installed: ["analysis-icu"]

JVM version (java -version): 9.0.4

OS version (uname -a if on a Unix-like system): Mac OS Sierra 10.12.6
16.7.0 Darwin Kernel Version 16.7.0: Thu Jan 11 22:59:40 PST 2018; root:xnu-3789.73.8~1/RELEASE_X86_64 x86_64

Description of the problem including expected versus actual behavior:
When doing a query_string query with the "stop" analyzer and wildcard analysis enabled, if the query contains stop words and wildcards (on the stop words or on other query terms), the expected behavior (at least, the behavior in Elasticsearch 5.5.0) is for the stop word to be removed from the token stream; the actual behavior is that it gets converted to MatchNoDocsQuery.

Steps to reproduce:

  1. Start an elasticsearch process: bin/elasticsearch
  2. Create a simple index:
curl -X PUT "localhost:9200/stop-wildcard-test" -H "Content-Type: application/json" -d '{
	"mappings": {
		"doc": {
			"properties": {
				"content": { "type": "text" }
			}
		}
	}
}'
  1. Request validation for "on the run*":
curl -X POST "localhost:9200/stop-wildcard-test/_validate/query?explain&pretty" -H "Content-Type: application/json" -d '{
  "query": {
      "query_string": {
        "query": "on the run*",
        "analyzer": "stop",
        "analyze_wildcard": true
      }
  }
}'

Response:

{
  "valid" : true,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "explanations" : [
    {
      "index" : "stop-wildcard-test",
      "valid" : true,
      "explanation" : "MatchNoDocsQuery(\"analysis was empty for content:on\") content:run*"
    }
  ]
}

I also tested the following queries:

  • on the run produces the expected content:run, without MatchNoDocsQuery
  • on* the run produces MatchNoDocsQuery(\"analysis was empty for content:on\") content:run
    Also tested with wildcard analysis off:
  • on the run* produces MatchNoDocsQuery(\"analysis was empty for content:on\") content:run*
  • on the run produces the expected content:run
  • on* the run produces content:on* content:run (unlike with wildcard analysis on)

For comparison, here is the response to the same query from a fresh installation of 5.5.0 (+ analysis_icu for parity, but probably not relevant here?):

{
    "valid": true,
    "_shards": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "explanations": [
        {
            "index": "stop-wildcard-test",
            "valid": true,
            "explanation": "_all:run"
        }
    ]
}

Provide logs (if relevant):
Not relevant.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions