Skip to content

Support nested aggregation when calcite enabled#4979

Merged
qianheng-aws merged 9 commits intoopensearch-project:mainfrom
LantaoJin:pr/issues/2739_new
Jan 5, 2026
Merged

Support nested aggregation when calcite enabled#4979
qianheng-aws merged 9 commits intoopensearch-project:mainfrom
LantaoJin:pr/issues/2739_new

Conversation

@LantaoJin
Copy link
Copy Markdown
Member

@LantaoJin LantaoJin commented Dec 19, 2025

Description

Refactor implementation for PPL: #3696 (closed)
Deprecated implementation for SQL: #2814 (closed)

Support nested aggregation in PPL only when calcite enabled.

With this PR, follow PPL query is able to execute a nested aggregation query.

source=logs | head 10000 | stats min(pages.load_time)

And it equals the DSL

GET logs/_search
{
  "aggs": {
    "pages": {
      "nested": {
        "path": "pages"
      },
      "aggs": {
        "min_load_time": { "min": { "field": "pages.load_time" } }
      }
    }
  }
}

Follow queries (group-by nested path) are supported with this PR as well:

source=test | top pages.load_time
source=test | stats count() by pages.load_time
source=test | dedup pages.load_time

Limitation:

  • PPL only
  • Calcite should be enabled
  • Throw UnsupportedOperationException if pushdown cannot be applied.
  • Follow queries (group-by nested root path) are supported with this PR without pushdown enahncement:
source=test | top pages
source=test | stats count() by pages
source=test | dedup pages

Related Issues

Resolves #4949, #4564, #2813 and #2529, and #2739

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Lantao Jin <ltjin@amazon.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Dec 19, 2025

📝 Walkthrough

Walkthrough

Adds nested-aggregation support and hint-based aggregation metadata across Calcite planning and OpenSearch pushdown layers, including nested-path resolution utilities, new hint utilities, a converter rule and physical nested aggregate, pushdown adaptations, response parsing updates, and integration tests/resources.

Changes

Cohort / File(s) Summary
Calcite hint utils & visitor
core/src/main/java/org/opensearch/sql/calcite/utils/PPLHintUtils.java, core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
New PPLHintUtils to attach/query ignoreNullBucket and hasNestedAggCall hints; CalciteRelNodeVisitor now computes and propagates hintIgnoreNullBucket, detects nested aggregators, and applies both hints during aggregation planning.
Calcite prepare & removed strategy
core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java, core/src/main/java/org/opensearch/sql/calcite/utils/PPLHintStrategyTable.java
Replaced protected factory API with public createPrepare(); removed standalone PPLHintStrategyTable (memoized table moved/handled in PPLHintUtils).
Plan utilities & schema helpers
core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java, core/src/main/java/org/opensearch/sql/utils/Utils.java, core/src/test/java/org/opensearch/sql/utils/UtilsTest.java
Removed legacy add-ignore-null hint helper; added Utils.resolveNestedPath and zipWithIndex plus unit tests for nested-path resolution.
Index scan & rule registry
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java, opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
buildSchema() added to include nested-paths; pushdown rule registration updated to separate non-pushdown/pushdown rule sets; bucket nullability now derived via PPLHintUtils.
New nested-agg conversion & physical rel
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableNestedAggregateRule.java, opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableNestedAggregate.java
Added ConverterRule to convert hinted LogicalAggregate to a new CalciteEnumerableNestedAggregate physical rel (validation, cost tweak, implement() unimplemented).
Aggregate analysis & pushdown actions
opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java, opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateFilterAnalyzer.java, opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/AggPushDownAction.java
Renamed aggFieldName→aggName; integrated Utils.resolveNestedPath to detect nested paths; AggPushDownAction unwraps/wraps NestedAggregationBuilder and adjusts composite/nested handling across push/re-push logic.
Predicate & response parsing
opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java, opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/BucketAggregationParser.java, opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/MetricParserHelper.java
NamedFieldExpression gains nestedPath (via Utils.resolveNestedPath); SimpleQueryExpression emits nestedQuery when present; parsers unwrap Nested/InternalNested aggregations before parsing.
Dedup & other planner rule tweaks
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
Dedup eligibility broadened to ARRAY/MAP; prevents pushdown when dedup columns resolve to nested paths; hint handling switched to PPLHintUtils.
Response/error handling & test support
opensearch/src/main/java/org/opensearch/sql/opensearch/response/error/ErrorMessageFactory.java, opensearch/src/test/java/org/opensearch/sql/opensearch/response/AggregationResponseUtils.java
Improved cause unwrapping and wrapping for non-OpenSearch exceptions; AggregationResponseUtils registers Nested parsing support.
Docs, tests & resources
docs/...; integ-test/src/... (many explain YAML/JSON, nested_simple mapping & data), integ-test/src/test/java/.../CalcitePPLNestedAggregationIT.java, integ-test/src/test/java/.../CalciteExplainIT.java, others
Added/updated integration tests and expected explain outputs for nested aggregation scenarios; adjusted various integration tests and fixtures; documentation updates to explain examples and limitations.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant CalciteRelNodeVisitor
    participant PPLHintUtils
    participant CalciteLogicalIndexScan
    participant EnumerableNestedAggregateRule
    participant AggregateAnalyzer
    participant OpenSearch

    Client->>CalciteRelNodeVisitor: visitAggregation(rel)
    CalciteRelNodeVisitor->>CalciteRelNodeVisitor: collect aggCallRefs
    CalciteRelNodeVisitor->>CalciteRelNodeVisitor: compute hintIgnoreNullBucket
    alt nested aggregator detected
        CalciteRelNodeVisitor->>PPLHintUtils: addNestedAggCallHintToAggregate(relBuilder)
    end
    alt hintIgnoreNullBucket true
        CalciteRelNodeVisitor->>PPLHintUtils: addIgnoreNullBucketHintToAggregate(relBuilder)
    end
    CalciteRelNodeVisitor->>CalciteLogicalIndexScan: pushDownAggregate(aggregate, hints)
    CalciteLogicalIndexScan->>CalciteLogicalIndexScan: buildSchema() / resolveNestedPath()
    CalciteLogicalIndexScan->>EnumerableNestedAggregateRule: rule matching (hints)
    EnumerableNestedAggregateRule->>EnumerableNestedAggregateRule: convert -> CalciteEnumerableNestedAggregate
    CalciteLogicalIndexScan->>AggregateAnalyzer: analyze(agg, bucketNames, nested info)
    AggregateAnalyzer->>AggregateAnalyzer: resolveNestedPath & build aggregations (may wrap NestedAggregationBuilder)
    CalciteLogicalIndexScan->>OpenSearch: execute pushdown with nested aggregation
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90+ minutes

Possibly related PRs

Suggested labels

PPL, pushdown

Suggested reviewers

  • penghuo
  • anirudha
  • derek-ho
  • kavithacm
  • joshuali925

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 19.42% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Support nested aggregation when calcite enabled' accurately describes the main change: adding nested aggregation support for PPL when Calcite is enabled.
Linked Issues check ✅ Passed The PR code changes implement nested aggregation support for PPL when Calcite is enabled, directly addressing issue #4949 which requires bucket-dependent aggregations (top, stats by, dedup) to work correctly on nested fields.
Out of Scope Changes check ✅ Passed All changes are scoped to implementing nested aggregation support: hint utilities, rule additions, aggregation analyzers, schema handling, and comprehensive tests aligned with the PR objectives.
Description check ✅ Passed The pull request description clearly explains the feature (nested aggregation support in PPL), provides examples of supported queries, documents known limitations, and references related issues.
✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Lantao Jin <ltjin@amazon.com>
@LantaoJin LantaoJin added the enhancement New feature or request label Dec 20, 2025
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Comment on lines -178 to -179
protected Function0<CalcitePrepare> createPrepareFactory() {
return OpenSearchPrepareImpl::new;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After upgraded to 1.41. this method was not called any more, change it to createPrepare()


@Override
protected PreparedResult implement(RelRoot root) {
Hook.PLAN_BEFORE_IMPLEMENTATION.run(root);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hook was called twice for non-full-scannable plan

curl -sS -H 'Content-Type: application/json' \
-X POST localhost:9200/_plugins/_ppl/_explain?format=extended \
-d '{"query" : "source=state_country | where age>30"}'
-d '{"query" : "source=state_country | head 10 | where age>30"}'
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid whole-plan-pushdown

Comment on lines 36 to +39
if (result.getCause() == null) {
if (result.getSuppressed().length > 0) {
return result.getSuppressed()[0];
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some exception without cause (root exception), the actual root cause may be attached in its suppressed cause.

return new OpenSearchErrorMessage(exception, exception.status().getStatus());
}
return new ErrorMessage(e, status);
return new ErrorMessage(cause, status);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to display the root cause instead of wrapping exception.

Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
// Used to track the current sub-builder as analysis progresses
Builder subBuilder = newMetricBuilder;
// Push auto date span & case in group-by list into nested aggregations
// Push auto date span & case in group-by list into structured aggregations
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid ambiguous, we call the multiple sub-aggregations structured aggregations instead of nested


private static Pair<Builder, List<MetricParser>> processAggregateCalls(
List<String> aggFieldNames,
List<String> aggNames,
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the value could be min(a.b), count(c), rename aggFieldNames to aggNames.

@qianheng-aws qianheng-aws merged commit 77633ef into opensearch-project:main Jan 5, 2026
44 of 45 checks passed
@opensearch-trigger-bot
Copy link
Copy Markdown
Contributor

The backport to 2.19-dev failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4979-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 77633ef589d839c57c4c00fdc8a86d70a02d74d8
# Push it to GitHub
git push --set-upstream origin backport/backport-4979-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev

Then, create a pull request where the base branch is 2.19-dev and the compare/head branch is backport/backport-4979-to-2.19-dev.

LantaoJin added a commit to LantaoJin/search-plugins-sql that referenced this pull request Jan 5, 2026
)

* refactor: throw exception if pushdown cannot be applied

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix tests

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Support top/dedup/aggregate by nested sub-fields

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix typo

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* address comments

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* minor fixing

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>
(cherry picked from commit 77633ef)
@LantaoJin LantaoJin added the backport-manually Filed a PR to backport manually. label Jan 5, 2026
LantaoJin added a commit that referenced this pull request Jan 5, 2026
* refactor: throw exception if pushdown cannot be applied



* fix tests



* fix IT



* Support top/dedup/aggregate by nested sub-fields



* fix typo



* address comments



* minor fixing



---------


(cherry picked from commit 77633ef)

Signed-off-by: Lantao Jin <ltjin@amazon.com>
aalva500-prog pushed a commit to aalva500-prog/sql that referenced this pull request Jan 12, 2026
)

* refactor: throw exception if pushdown cannot be applied

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix tests

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Support top/dedup/aggregate by nested sub-fields

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix typo

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* address comments

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* minor fixing

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev backport-failed backport-manually Filed a PR to backport manually. enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] top and other bucket-dependent aggregations don't work on nested fields

3 participants