Skip to content

Terms aggregation shows up irrevelant data #28044

@fvilpoix

Description

@fvilpoix

Elasticsearch version (bin/elasticsearch --version): 6.11

Plugins installed: analysis-icu

JVM version (java -version):

openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-1~deb9u1-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)

OS version (uname -a if on a Unix-like system):

Linux plop 4.9.0-4-amd64 #1 SMP Debian 4.9.65-3 (2017-12-03) x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

TL;DR Since upgrade to 6.0 (then last 6.1.1), Terms aggregation on integer field shows result on data that should not exists for the provided query.

Original post on forum

Here is a first request I do, in order to assert that I do not have any data > 60:

{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "media_id": "aaa"
          }
        },
        {
          "range": {
            "eng.visu": {
              "gte": 60
            }
          }
        }
      ]
    }
  },
  "size": 9999
} 

eng.visu is an array of 1 to 5 integers, always < 60 for this media_id.

Result is as expected:

{
  "took": 483,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

But then, I do a terms aggregation on those data:

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "media_id": "aaa"
          }
        }
      ]
    }
  },
  "aggs": {
    "__all__": {
      "terms": {
        "field": "eng.visu",
        "size": 9999
      }
    }
  },
  "size": 0
}

And the result:

{
"took": 24,
"timed_out": false,
"_shards": {
  "total": 5,
  "successful": 5,
  "skipped": 0,
  "failed": 0
},
"hits": {
  "total": 18670,
  "max_score": 0,
  "hits": []
},
"aggregations": {
    "__all__": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 1,
          "doc_count": 690
        },
        {
          "key": 0,
          "doc_count": 674
        },
        {
          "key": 2,
          "doc_count": 655
        },
        ...
       {
          "key": 80,
          "doc_count": 298
       },
      {
          "key": 82,
          "doc_count": 298
       },
       ...
       {
          "key": 5276,
          "doc_count": 1
        }
      ]
   }
}
}

As you can see, I have keys that are really greater than 60.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions