Skip to content

Select global top N records with grouping by multiple fields #121

@ylwu-amzn

Description

@ylwu-amzn

Is your feature request related to a problem? Please describe.
We need to query top N records by grouping by multiple fields and sorting by doc count. Checked the explained DSL, seems no sort logic. Does that mean SQL plugin will take in the query result and sort ? In this way, the sorted result may be not globally top records? For example, we have 10K documents. The query returns 1K documents by default, then SQL plugin just sorts these 1K documents without checking the other 9K documents.

My testing query

POST _opendistro/_sql/_explain
{
  "query": "SELECT count(value) as c FROM test_data2 group by field1, field2 order by c"
}

# Response
{
  "root": {
    "name": "ProjectOperator",
    "description": {
      "fields": "[c]"
    },
    "children": [
      {
        "name": "SortOperator",
        "description": {
          "sortList": {
            "count(value)": {
              "sortOrder": "ASC",
              "nullOrder": "NULL_FIRST"
            }
          }
        },
        "children": [
          {
            "name": "ElasticsearchIndexScan",
            "description": {
              "request": """ElasticsearchQueryRequest(indexName=test_data2, sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"field1":{"terms":{"field":"field1","missing_bucket":true,"order":"asc"}}},{"field2":{"terms":{"field":"field2","missing_bucket":true,"order":"asc"}}}]},"aggregations":{"count(value)":{"value_count":{"field":"value"}}}}}}, searchDone=false)"""
            },
            "children": []
          }
        ]
      }
    ]
  }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions