Skip to content

Root search analyzer doesn't act as default for fields #3102

@gakhov

Description

@gakhov

Recently, we moved from version 0.19.11 to last stable 0.90.0 and found one very strange behavior that looks like an issue.

If on the item root level we have specified our custom index_analyzer, search_analyzer (or just analyzer), then index_analyzer works well, but not the search_analyzer. Also, fif we update existing mapping with explicitly specifying search_analyzer on the field level, then it still doesn't seem to work and ES uses standard one.

To reproduce:

Create a new index and define our custom analyzer de_stem:

curl -XPUT 'http://localhost:9200/issue/' -d '{"index": {"number_of_shards": 1,"analysis": {"filter": {"de_snowball": {"type": "snowball","language": "German"}},"analyzer": {"de_stem": {"type": "custom","tokenizer": "standard","filter": ["lowercase", "de_snowball"]}}}}},"number_of_replicas": 0}}'

Put mapping with specified index_analyzer and search_analyzer :

curl -XPUT 'http://localhost:9200/issue/item/_mapping' -d '{"item": {"index_analyzer" : "de_stem","search_analyzer" : "de_stem","properties": {"content": {"dynamic": false,"properties": {"body": {"type": "string"}}}}}}}'

Try search_analyzer for the field content.body with Analyze API

curl -XGET 'localhost:9200/issue/_analyze?pretty=true&field=content.body' -d 'Apple'

Actual result:

{
  "tokens" : [ {
    "token" : "apple",
    "start_offset" : 0,
    "end_offset" : 5,
    "type" : "<ALPHANUM>",
    "position" : 1
  } ]
}

Expected result:

{
  "tokens" : [ {
    "token" : "appl",
    "start_offset" : 0,
    "end_offset" : 5,
    "type" : "<ALPHANUM>",
    "position" : 1
  } ]
}

The right (expected) result still possible to get, but with explicitly specified search_analyzer:

curl -XGET 'localhost:9200/issue/_analyze?pretty=true&field=content.body&analyzer=de_stem' -d 'Apple'

Index Analyzer is set well

As we see above, search_analyzer seems wasn't set, but index_analyzer works well.

Let's index a document:

curl -PUT 'http://localhost:9200/issue/item/1' -d '{"content" : {"body": "10 Things We Hate About Apple"}}'

If index_analyzer was set well to de_stem the word Apple should be indexed as appl, but not apple (as standard analyzer does).

Let's search for appl first:

curl -XGET 'http://localhost:9200/issue/_search?search_type=count&pretty=true' -d '{"query":{"query_string":{"fields":["content.body"],"query":"appl"}}}'

It works! We get back 1 result:

  "hits" : {
    "total" : 1,
    "max_score" : 0.0,
    "hits" : [ ]
  }

For the word apple, as expected, it doesn't work since search_analyzer is standard, but index_analyzer is de_stem (so, actual search term will stay apple, but indexed is appl):

curl -XGET 'http://localhost:9200/issue/_search?search_type=count&pretty=true' -d '{"query":{"query_string":{"fields":["content.body"],"query":"apple"}}}'
  "hits" : {
    "total" : 0,
    "max_score" : 0.0,
    "hits" : [ ]
  }

Specifying search analyzer with Put Mapping API doesn't help

Ok, i try to update mapping and specify explicitly search analyzer for content.body field on the existing index we created above:

curl -PUT 'http://localhost:9200/issue/item/_mapping' -d '{"item": {"properties": {"content": {"dynamic": false,"properties": {"body": {"type": "string", "search_analyzer": "de_stem"}}}}}}'

Response is ok, but the all problems described above stay the same. So, it seems the search_analyzer for the field content.body is still standard.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Search Foundations/MappingIndex mappings, including merging and defining field typesTeam:Search FoundationsMeta label for the Search Foundations team in Elasticsearch

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions