Refactor field expansion for match, multi_match and query_string query#25726
Refactor field expansion for match, multi_match and query_string query#25726jimczi merged 3 commits intoelastic:masterfrom
Conversation
There was a problem hiding this comment.
@jimczi I just started playing with this (hence the short review), but I found something that causes some dissonance. If someone configures "default_field": "*" in the index settings, it does not work the same way as configuring "default_field": "*" in the query does.
Here's something that demonstrates the problem:
DELETE /test?pretty=false
{}
PUT /test?pretty=false
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0,
"query": {
"default_field": "*"
}
}
},
"mappings": {
"doc": {
"properties": {
"f1": {"type": "text"},
"f2": {"type": "text"}
}
}
}
}
POST /test/doc/1?refresh=wait_for&filter_path=result&pretty=false
{"f1": "foo", "f2": "bar"}
POST /test/_search
{
"query": {
"query_string": {
"query": "foo AND bar"
}
}
}
POST /test/_search
{
"query": {
"query_string": {
"query": "foo AND bar",
"default_field": "*"
}
}
}And the output:
{"acknowledged":true}
{"acknowledged":true,"shards_acknowledged":true,"index":"test"}
{"result":"created"}
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.5753642,
"hits" : [
{
"_index" : "test",
"_type" : "doc",
"_id" : "1",
"_score" : 0.5753642,
"_source" : {
"f1" : "foo",
"f2" : "bar"
}
}
]
}
}There was a problem hiding this comment.
I think this comment is misleading since this is a getter
|
Thanks @dakrone , good catch. |
d67c61a to
5a30987
Compare
|
This PR has been updated to refactor the expansion of field names for the main es queries (the simple query string should be done in a follow up). The new description reflects the latest status of the PR (and explains the main changes). @dakrone I need to document the new behavior and work a bit on tests but the code is ready for a review I think ;). |
Does this include the |
It's a separate type but you are right this include the |
dakrone
left a comment
There was a problem hiding this comment.
LGTM, I left some comments about whether we want to capture exception messages, but it's up to you for addressing them.
Thanks for doing this!
There was a problem hiding this comment.
I think "*".equals(defaultField) should use Regex.isMatchAllPattern
There was a problem hiding this comment.
Can we include the exception message? It might be good to know that a lenient query failed due to something that is avoidable
There was a problem hiding this comment.
I think the indentation is off here still? :)
There was a problem hiding this comment.
should this return a MatchNoDocsQuery like the other ones where an exception is thrown?
There was a problem hiding this comment.
Should we include the message of the exception here?
There was a problem hiding this comment.
Should we leave this in here, but mark it as deprecated? It still technically exists and I'm not sure we want people that have all_fields: true set to not be able to look up what the option does (until it's removed)
5a30987 to
33e79b6
Compare
This commit changes the way we handle field expansion in `match`, `multi_match` and `query_string` query. The main changes are: - For exact field name, the new behavior is to rewrite to a matchnodocs query when the field name is not found in the mapping. - For partial field names (with `*` suffix), the expansion is done only on `keyword`, `text`, `date` and `number` field types. Other field types are simply ignored. - For all fields (`*`), the expansion is done on accepted field types only (see above) and metadata fields are also filtered. - The `*` notation can also be used to set `default_field` option on`query_string` query. This should replace the needs for the extra option `use_all_fields` which is deprecated in this change. This commit also rewrites simple `*` query to matchalldocs query when all fields are requested (Fixes elastic#25556). The same change should be done on `simple_query_string` for completeness. Relates elastic#25551
b8b8f80 to
8fbb2b9
Compare
Related to elastic#25726, this resolves elastic#25556 for the 5.x series by parsing "*" as a `MatchAllDocsQuery` instead of expanding it to a (potentially expensive) query on the `_field_names`.
|
Hey @jimczi, I'm seeing an odd difference between I'm using a weblogs data set with IP fields. If I send the following query, I get an error: A multi_match with The following query_string query using default_field, however, works fine: For me, it would be ideal if |
|
This is just a leftover of the |
|
Ah, I had no idea that |
… query_string This change is a continuation of elastic#25726 that aligns field expansions for the simple_query_string with the query_string and multi_match query. The main changes are: * For exact field name, the new behavior is to rewrite to a matchnodocs query when the field name is not found in the mapping. * For partial field names (with * suffix), the expansion is done only on keyword, text, date, ip and number field types. Other field types are simply ignored. * For all fields (*), the expansion is done on accepted field types only (see above) and metadata fields are also filtered. The use_all_fields option is deprecated in this change and can be replaced by setting `*` in the fields parameter. This commit also changes how text fields are analyzed. Previously the default search analyzer (or the provided analyzer) was used to analyze every text part , ignoring the analyzer set on the field in the mapping. With this change, the field analyzer is used instead unless an analyzer has been forced in the parameter of the query. Finally now that all full text queries can handle the special "*" expansion (`all_fields` mode), the `index.query.default_field` is now set to `*` for indices created in 6.
… query_string (#26145) This change is a continuation of #25726 that aligns field expansions for the simple_query_string with the query_string and multi_match query. The main changes are: * For exact field name, the new behavior is to rewrite to a matchnodocs query when the field name is not found in the mapping. * For partial field names (with * suffix), the expansion is done only on keyword, text, date, ip and number field types. Other field types are simply ignored. * For all fields (*), the expansion is done on accepted field types only (see above) and metadata fields are also filtered. The use_all_fields option is deprecated in this change and can be replaced by setting `*` in the fields parameter. This commit also changes how text fields are analyzed. Previously the default search analyzer (or the provided analyzer) was used to analyze every text part , ignoring the analyzer set on the field in the mapping. With this change, the field analyzer is used instead unless an analyzer has been forced in the parameter of the query. Finally now that all full text queries can handle the special "*" expansion (`all_fields` mode), the `index.query.default_field` is now set to `*` for indices created in 6.
… query_string (#26145) This change is a continuation of #25726 that aligns field expansions for the simple_query_string with the query_string and multi_match query. The main changes are: * For exact field name, the new behavior is to rewrite to a matchnodocs query when the field name is not found in the mapping. * For partial field names (with * suffix), the expansion is done only on keyword, text, date, ip and number field types. Other field types are simply ignored. * For all fields (*), the expansion is done on accepted field types only (see above) and metadata fields are also filtered. The use_all_fields option is deprecated in this change and can be replaced by setting `*` in the fields parameter. This commit also changes how text fields are analyzed. Previously the default search analyzer (or the provided analyzer) was used to analyze every text part , ignoring the analyzer set on the field in the mapping. With this change, the field analyzer is used instead unless an analyzer has been forced in the parameter of the query. Finally now that all full text queries can handle the special "*" expansion (`all_fields` mode), the `index.query.default_field` is now set to `*` for indices created in 6.
… query_string (#26145) This change is a continuation of #25726 that aligns field expansions for the simple_query_string with the query_string and multi_match query. The main changes are: * For exact field name, the new behavior is to rewrite to a matchnodocs query when the field name is not found in the mapping. * For partial field names (with * suffix), the expansion is done only on keyword, text, date, ip and number field types. Other field types are simply ignored. * For all fields (*), the expansion is done on accepted field types only (see above) and metadata fields are also filtered. The use_all_fields option is deprecated in this change and can be replaced by setting `*` in the fields parameter. This commit also changes how text fields are analyzed. Previously the default search analyzer (or the provided analyzer) was used to analyze every text part , ignoring the analyzer set on the field in the mapping. With this change, the field analyzer is used instead unless an analyzer has been forced in the parameter of the query. Finally now that all full text queries can handle the special "*" expansion (`all_fields` mode), the `index.query.default_field` is now set to `*` for indices created in 6.
This commit changes the way we handle field expansion in
match,multi_matchandquery_stringquery.The main changes are:
For exact field name, the new behavior is to rewrite to a matchnodocs query when the field name is not found in the mapping.
For partial field names (with
*suffix), the expansion is done only onkeyword,text,date,ipandnumberfield types. Other field types are simply ignored.For all fields (
*), the expansion is done on accepted field types only (see above) and metadata fields are also filtered.The
*notation can also be used to setdefault_fieldoption onquery_stringquery. This should replace the needs for the extra optionuse_all_fieldswhich is deprecated in this change.This commit also rewrites simple
*query to matchalldocs query when all fields are requested (Fixes #25556).The same change should be done on
simple_query_stringfor completeness.use_all_fieldsoption inquery_stringis also deprecated in this change,default_fieldshould be set to*instead.Relates #25551