Skip to content

Nested search failing with _source disabled #43517

@sronsiek

Description

@sronsiek

Elasticsearch version (bin/elasticsearch --version):

OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
Version: 7.0.1, Build: default/docker/e4efcb5/2019-04-29T12:56:03.145736Z, JVM: 12.0.1

Plugins installed: []

bin/elasticsearch-plugin install --batch ingest-attachment

JVM version (java -version):

openjdk version "12.0.1" 2019-04-16
OpenJDK Runtime Environment (build 12.0.1+12)
OpenJDK 64-Bit Server VM (build 12.0.1+12, mixed mode, sharing)

OS version (uname -a if on a Unix-like system):

Elastic is running in the official elastic docker container

Linux elastic 4.4.76-1-default #1 SMP Fri Jul 14 08:48:13 UTC 2017 (9a2885c) x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

I have been upgrading an existing application from Elastic v2.1.2 to v7.0.1.

The mapping for document types has _source disabled.

One feature we use is a nested field 'history'. Each record contains an history array, each element of which contains several properties (author, created_at, state).

Search on these fields worked fine in v2.1.2, but used a, now deprecated, 'include_in_parent' flag. In v7.0.1, queries containing are seen to fail with null-pointer exceptions. A lot of bug-chasing later I found the feature can be made to work, if I add ANY one of the history properties with _source enabled in the mapping. After this all searches started working - even when they were on history fields other than the one included in _source. In the queries, _source is set to false, both at top level and within inner_hits, since all we need is the parent doc _id.

Steps to reproduce:

  1. Create template with mapping & settings (without the workaround):
curl -H 'Content-Type: application/json' -X PUT http://localhost:9200/_template/test_doc -d '
{
  "index_patterns": "test_doc*",
  "mappings": {
    "_source": {
      "enabled": false
    },
    "properties": {
      "doc_body": {
        "fields": {
          "keyword": {
            "ignore_above": 256,
            "type": "keyword"
          }
        },
        "type": "text"
      },
      "history": {
        "properties": {
          "author": {
            "fields": {
              "keyword": {
                "ignore_above": 256,
                "type": "keyword"
              }
            },
            "type": "text"
          },
          "created_at": {
            "type": "date"
          },
          "state": {
            "index": true,
            "type": "keyword"
          }
        },
        "type": "nested"
      },
      "id": {
        "index": true,
        "type": "long"
      }
    }
  },
  "order": 0,
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {
          "filter": [
            "lowercase"
          ],
          "tokenizer": "whitespace"
        }
      }
    },
    "index": {
      "max_inner_result_window": 1000
    },
    "number_of_replicas": 0,
    "number_of_shards": 1,
    "refresh_interval": "1s"
  }
}'
  1. Insert first record (also creating the index):
curl -H 'Content-Type: application/json' -X POST http://localhost:9200/test_doc/_doc/1 -d '
{
    "id": 1000,
    "doc_body": "This is some body text",
    "history": [
        {
            "author": "Freddy",
            "created_at": "2019-06-23T13:10",
            "state": "created"
        },
        {
            "author": "Mike",
            "created_at": "2019-06-23T13:12",
            "state": "created"
        }
    ]
}'
  1. Perform a search on nested data:
curl -H 'Content-Type: application/json' -X POST http://localhost:9200/test_doc/_search -d '
{
    "query": {
        "bool": {
            "filter": {
                "nested": {
                    "path": "history",
                    "inner_hits": {
                        "size": 50,
                        "name": "history"
                    },
                    "query": {
                        "range": {
                            "history.created_at": {
                                "gte": "2012-01-01"
                            }
                        }
                    }
                }
            }
        }
    },
    "size": "100",
    "sort": [
        {
            "id": "desc"
        }
    ]
}'

results in:

{"error":{"root_cause":[{"type":"null_pointer_exception","reason":null}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"test_doc","node":"MZot1_RnSvqSm6g7wCuisw","reason":{"type":"null_pointer_exception","reason":null}}],"caused_by":{"type":"null_pointer_exception","reason":null,"caused_by":{"type":"null_pointer_exception","reason":null}}},"status":500}

plus a lengthy exception stack in the elastic log (attached).

Now, changing the _source setting in the template:

  "mappings": {
    "_source": {
      "enabled": false
    },

with:

  "mappings": {
    "_source": {
      "includes": [
        "history.created_at"
      ]
    },

and repeating the steps will work as expected.

Possibly also relevant, elasticsearch.yml contains:

discovery.type: single-node

For Info: Aggregated searches do not appear to be affected by this issue - seen to work in both cases.

elastic_exception.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Search/SearchSearch-related issues that do not fall into other categories>bug

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions