[FEATURE] Enhancement on agg with dependent bucket fields

**Is your feature request related to a problem?**
Clickbench Query q36:
```
source=hits | eval `ClientIP - 1` = ClientIP - 1, `ClientIP - 2` = ClientIP - 2, `ClientIP - 3` = ClientIP - 3 | stats count() as c by `ClientIP`, `ClientIP - 1`, `ClientIP - 2`, `ClientIP - 3` | sort - c | head 10
```

currently gets plan:
```
Explain: {
  "calcite": {
    "logical": "LogicalSystemLimit(sort0=[$0], dir0=[DESC-nulls-last], fetch=[10000], type=[QUERY_SIZE_LIMIT])
  LogicalSort(sort0=[$0], dir0=[DESC-nulls-last], fetch=[10])
    LogicalProject(c=[$4], ClientIP=[$0], ClientIP - 1=[$1], ClientIP - 2=[$2], ClientIP - 3=[$3])
      LogicalAggregate(group=[{0, 1, 2, 3}], c=[COUNT()])
        LogicalProject(ClientIP=[$76], ClientIP - 1=[-($76, 1)], ClientIP - 2=[-($76, 2)], ClientIP - 3=[-($76, 3)])
          CalciteLogicalIndexScan(table=[[OpenSearch, hits]])
",
    "physical": "EnumerableLimit(fetch=[10000])
  EnumerableCalc(expr#0..4=[{inputs}], c=[$t4], ClientIP=[$t0], ClientIP - 1=[$t1], ClientIP - 2=[$t2], ClientIP - 3=[$t3])
    EnumerableLimit(fetch=[10])
      EnumerableSort(sort0=[$4], dir0=[DESC-nulls-last])
        CalciteEnumerableIndexScan(table=[[OpenSearch, hits]], PushDownContext=[[AGGREGATION->rel#6613:LogicalAggregate.NONE.[](input=RelSubset#6612,group={0, 1, 2, 3},c=COUNT())], OpenSearchRequestBuilder(sourceBuilder={\"from\":0,\"size\":0,\"timeout\":\"1m\",\"aggregations\":{\"composite_buckets\":{\"composite\":{\"size\":1000,\"sources\":[{\"ClientIP\":{\"terms\":{\"field\":\"ClientIP\",\"missing_bucket\":true,\"missing_order\":\"first\",\"order\":\"asc\"}}},{\"ClientIP - 1\":{\"terms\":{\"script\":
...
...
,\"missing_bucket\":true,\"missing_order\":\"first\",\"order\":\"asc\"}}}]},\"aggregations\":{\"c\":{\"value_count\":{\"field\":\"_index\"}}}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])
"
  }
}
```
which contains multiple buckets with script.

However, the bucket fields `ClientIP`, `ClientIP - 1`, `ClientIP - 2`, `ClientIP - 3` should  be equivalent on group splitting, which means it should make no difference between using any one of them as the group field with using all of them. If we optimize the plan to use only 1 bucket field, e.g. `ClientIP`, it could avoid script and multiple-level buckets, which both contribute to better performance.


**What solution would you like?**
Implement a new rule to simplify plan with single field `ClientIP` in agg and create a project upon original agg to add other bucket fields back. The plan could be presented by query like 
```
source=hits | stats stats count() as c by `ClientIP`| eval ClientIP - 1` = ClientIP - 1, `ClientIP - 2` = ClientIP - 2, `ClientIP - 3` = ClientIP - 3 | ...
```

**What alternatives have you considered?**
A clear and concise description of any alternative solutions or features you've considered.

**Do you have any additional context?**
Add any other context or screenshots about the feature request here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Enhancement on agg with dependent bucket fields #4681

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE] Enhancement on agg with dependent bucket fields #4681

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions