Support nested aggregation when calcite enabled#4979
Support nested aggregation when calcite enabled#4979qianheng-aws merged 9 commits intoopensearch-project:mainfrom
Conversation
Signed-off-by: Lantao Jin <ltjin@amazon.com>
📝 WalkthroughWalkthroughAdds nested-aggregation support and hint-based aggregation metadata across Calcite planning and OpenSearch pushdown layers, including nested-path resolution utilities, new hint utilities, a converter rule and physical nested aggregate, pushdown adaptations, response parsing updates, and integration tests/resources. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant CalciteRelNodeVisitor
participant PPLHintUtils
participant CalciteLogicalIndexScan
participant EnumerableNestedAggregateRule
participant AggregateAnalyzer
participant OpenSearch
Client->>CalciteRelNodeVisitor: visitAggregation(rel)
CalciteRelNodeVisitor->>CalciteRelNodeVisitor: collect aggCallRefs
CalciteRelNodeVisitor->>CalciteRelNodeVisitor: compute hintIgnoreNullBucket
alt nested aggregator detected
CalciteRelNodeVisitor->>PPLHintUtils: addNestedAggCallHintToAggregate(relBuilder)
end
alt hintIgnoreNullBucket true
CalciteRelNodeVisitor->>PPLHintUtils: addIgnoreNullBucketHintToAggregate(relBuilder)
end
CalciteRelNodeVisitor->>CalciteLogicalIndexScan: pushDownAggregate(aggregate, hints)
CalciteLogicalIndexScan->>CalciteLogicalIndexScan: buildSchema() / resolveNestedPath()
CalciteLogicalIndexScan->>EnumerableNestedAggregateRule: rule matching (hints)
EnumerableNestedAggregateRule->>EnumerableNestedAggregateRule: convert -> CalciteEnumerableNestedAggregate
CalciteLogicalIndexScan->>AggregateAnalyzer: analyze(agg, bucketNames, nested info)
AggregateAnalyzer->>AggregateAnalyzer: resolveNestedPath & build aggregations (may wrap NestedAggregationBuilder)
CalciteLogicalIndexScan->>OpenSearch: execute pushdown with nested aggregation
Estimated code review effort🎯 5 (Critical) | ⏱️ ~90+ minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
Comment |
| protected Function0<CalcitePrepare> createPrepareFactory() { | ||
| return OpenSearchPrepareImpl::new; |
There was a problem hiding this comment.
After upgraded to 1.41. this method was not called any more, change it to createPrepare()
|
|
||
| @Override | ||
| protected PreparedResult implement(RelRoot root) { | ||
| Hook.PLAN_BEFORE_IMPLEMENTATION.run(root); |
There was a problem hiding this comment.
This hook was called twice for non-full-scannable plan
core/src/main/java/org/opensearch/sql/calcite/utils/PPLHintStrategyTable.java
Outdated
Show resolved
Hide resolved
| curl -sS -H 'Content-Type: application/json' \ | ||
| -X POST localhost:9200/_plugins/_ppl/_explain?format=extended \ | ||
| -d '{"query" : "source=state_country | where age>30"}' | ||
| -d '{"query" : "source=state_country | head 10 | where age>30"}' |
There was a problem hiding this comment.
avoid whole-plan-pushdown
| if (result.getCause() == null) { | ||
| if (result.getSuppressed().length > 0) { | ||
| return result.getSuppressed()[0]; | ||
| } |
There was a problem hiding this comment.
For some exception without cause (root exception), the actual root cause may be attached in its suppressed cause.
| return new OpenSearchErrorMessage(exception, exception.status().getStatus()); | ||
| } | ||
| return new ErrorMessage(e, status); | ||
| return new ErrorMessage(cause, status); |
There was a problem hiding this comment.
Prefer to display the root cause instead of wrapping exception.
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
| // Used to track the current sub-builder as analysis progresses | ||
| Builder subBuilder = newMetricBuilder; | ||
| // Push auto date span & case in group-by list into nested aggregations | ||
| // Push auto date span & case in group-by list into structured aggregations |
There was a problem hiding this comment.
To avoid ambiguous, we call the multiple sub-aggregations structured aggregations instead of nested
|
|
||
| private static Pair<Builder, List<MetricParser>> processAggregateCalls( | ||
| List<String> aggFieldNames, | ||
| List<String> aggNames, |
There was a problem hiding this comment.
Here the value could be min(a.b), count(c), rename aggFieldNames to aggNames.
|
The backport to To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4979-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 77633ef589d839c57c4c00fdc8a86d70a02d74d8
# Push it to GitHub
git push --set-upstream origin backport/backport-4979-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-devThen, create a pull request where the |
) * refactor: throw exception if pushdown cannot be applied Signed-off-by: Lantao Jin <ltjin@amazon.com> * fix tests Signed-off-by: Lantao Jin <ltjin@amazon.com> * fix IT Signed-off-by: Lantao Jin <ltjin@amazon.com> * Support top/dedup/aggregate by nested sub-fields Signed-off-by: Lantao Jin <ltjin@amazon.com> * fix typo Signed-off-by: Lantao Jin <ltjin@amazon.com> * address comments Signed-off-by: Lantao Jin <ltjin@amazon.com> * minor fixing Signed-off-by: Lantao Jin <ltjin@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com> (cherry picked from commit 77633ef)
* refactor: throw exception if pushdown cannot be applied * fix tests * fix IT * Support top/dedup/aggregate by nested sub-fields * fix typo * address comments * minor fixing --------- (cherry picked from commit 77633ef) Signed-off-by: Lantao Jin <ltjin@amazon.com>
) * refactor: throw exception if pushdown cannot be applied Signed-off-by: Lantao Jin <ltjin@amazon.com> * fix tests Signed-off-by: Lantao Jin <ltjin@amazon.com> * fix IT Signed-off-by: Lantao Jin <ltjin@amazon.com> * Support top/dedup/aggregate by nested sub-fields Signed-off-by: Lantao Jin <ltjin@amazon.com> * fix typo Signed-off-by: Lantao Jin <ltjin@amazon.com> * address comments Signed-off-by: Lantao Jin <ltjin@amazon.com> * minor fixing Signed-off-by: Lantao Jin <ltjin@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com> Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Description
Refactor implementation for PPL: #3696 (closed)
Deprecated implementation for SQL: #2814 (closed)
Support nested aggregation in PPL only when calcite enabled.
With this PR, follow PPL query is able to execute a nested aggregation query.
And it equals the DSL
Follow queries (group-by nested path) are supported with this PR as well:
Limitation:
Related Issues
Resolves #4949, #4564, #2813 and #2529, and #2739
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.