feat: Add MergeMaxByAggregations optimizer to combine multiple max_by calls with same second arg#26873
feat: Add MergeMaxByAggregations optimizer to combine multiple max_by calls with same second arg#26873maniloya wants to merge 1 commit into
Conversation
Reviewer's GuideIntroduces a new MergeMaxByAggregations optimizer rule that, when enabled via a new session/config property, rewrites multiple compatible MAX_BY aggregations sharing the same comparison key into a single MAX_BY over a ROW plus dereference projections, along with new plan matchers and tests to validate the behavior at rule and query levels. Sequence diagram for applying MergeMaxByAggregations during planningsequenceDiagram
participant Client
participant Coordinator
participant Planner
participant Session
participant Rule as MergeMaxByAggregations
Client->>Coordinator: submit SQL query
Coordinator->>Planner: create logical plan
Planner->>Session: get system properties
Session-->>Planner: MERGE_MAX_BY_AGGREGATIONS value
Planner->>Rule: isEnabled(session)
Rule->>Session: SystemSessionProperties.isMergeMaxByAggregationsEnabled(session)
Session-->>Rule: boolean enabled
Rule-->>Planner: enabled
alt rule enabled
Planner->>Rule: apply(aggregationNode, captures, context)
Rule->>Rule: collect MAX_BY aggregations with two args
Rule->>Rule: group by MaxByKey (comparisonKey, filter, mask, distinct)
Rule->>Rule: identify mergeable groups (size >= 2)
alt mergeable groups exist
Rule->>Rule: build bottom ProjectNode with ROW_CONSTRUCTOR expressions
Rule->>Rule: build new merged MAX_BY calls over ROW
Rule->>Rule: build AggregationNode with newAggregations
Rule->>Rule: build top ProjectNode with DEREFERENCE projections
Rule-->>Planner: Result.ofPlanNode(topProjectNode)
else no mergeable groups
Rule-->>Planner: Result.empty()
end
else rule disabled
Rule-->>Planner: not applied
end
Planner->>Coordinator: optimized plan
Coordinator-->>Client: execute query with optimized aggregations
Class diagram for MergeMaxByAggregations optimizer ruleclassDiagram
class MergeMaxByAggregations {
- String MAX_BY
- FunctionResolution functionResolution
- Pattern aggregationPattern
+ MergeMaxByAggregations(FunctionAndTypeManager functionAndTypeManager)
+ boolean isEnabled(Session session)
+ Pattern getPattern()
+ Result apply(AggregationNode node, Captures captures, Context context)
- boolean isMaxByAggregationWithTwoArgs(Aggregation aggregation)
}
class MaxByKey {
- RowExpression comparisonKey
- Optional filter
- Optional mask
- boolean isDistinct
+ MaxByKey(Aggregation aggregation)
+ boolean equals(Object other)
+ int hashCode()
}
class AggregationNode {
+ Map getAggregations()
+ List getOutputVariables()
+ PlanNode getSource()
+ List getGroupingSets()
+ List getPreGroupedVariables()
+ Step getStep()
+ Optional getHashVariable()
+ Optional getGroupIdVariable()
+ Optional getAggregationId()
}
class Aggregation {
+ RowExpression getCall()
+ Optional getFilter()
+ Optional getOrderBy()
+ boolean isDistinct()
+ Optional getMask()
}
class CallExpression {
+ String getDisplayName()
+ List getArguments()
+ Type getType()
+ FunctionHandle getFunctionHandle()
}
class ProjectNode {
+ ProjectNode(SourceLocation sourceLocation, PlanNodeId id, PlanNode source, Assignments assignments, Locality locality)
}
class FunctionResolution {
+ FunctionResolution(FunctionAndTypeResolver resolver)
+ boolean isMaxByFunction(FunctionHandle functionHandle)
+ FunctionHandle lookupBuiltInFunction(String name, List types)
}
class FunctionAndTypeManager {
+ FunctionAndTypeResolver getFunctionAndTypeResolver()
}
class Context {
+ VariableAllocator getVariableAllocator()
+ PlanNodeIdAllocator getIdAllocator()
}
class VariableAllocator {
+ VariableReferenceExpression newVariable(String hint, Type type)
+ VariableReferenceExpression newVariable(CallExpression callExpression)
}
class SystemSessionProperties {
+ static boolean isMergeMaxByAggregationsEnabled(Session session)
}
MergeMaxByAggregations --> AggregationNode : Rule~AggregationNode~
MergeMaxByAggregations --> FunctionResolution
MergeMaxByAggregations --> SystemSessionProperties : uses
MergeMaxByAggregations o-- MaxByKey : groups_by
MergeMaxByAggregations --> ProjectNode : builds
MergeMaxByAggregations --> Aggregation : merges
MergeMaxByAggregations --> CallExpression : inspects
MergeMaxByAggregations --> Context : uses
Context --> VariableAllocator
FunctionResolution --> FunctionAndTypeManager
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- In
MergeMaxByAggregations.PATTERNyou detect candidates viagetCall().getDisplayName().equals("max_by"), while the main logic usesfunctionResolution.isMaxByFunction(...); consider using the same function-resolution-based check in the pattern to avoid missing or misidentifying max_by calls (e.g., if naming or resolution changes). - The
MaxByKeygrouping key currently considers only the comparison expression, filter, mask, and distinct flag; ifAggregation.getOrderBy()is non-empty, merging aggregations with different ORDER BY clauses could change semantics, so it may be safer to includeorderBy(and any other behavior-affecting properties) inMaxByKey.equals/hashCode. - The session property registration for
ADD_DISTINCT_BELOW_SEMI_JOIN_BUILDwas changed to use the four-argumentbooleanProperty(..., false)overload when addingMERGE_MAX_BY_AGGREGATIONS; double-check that the visibility/experimental flags for the existing property are intentional and unchanged in behavior.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `MergeMaxByAggregations.PATTERN` you detect candidates via `getCall().getDisplayName().equals("max_by")`, while the main logic uses `functionResolution.isMaxByFunction(...)`; consider using the same function-resolution-based check in the pattern to avoid missing or misidentifying max_by calls (e.g., if naming or resolution changes).
- The `MaxByKey` grouping key currently considers only the comparison expression, filter, mask, and distinct flag; if `Aggregation.getOrderBy()` is non-empty, merging aggregations with different ORDER BY clauses could change semantics, so it may be safer to include `orderBy` (and any other behavior-affecting properties) in `MaxByKey.equals`/`hashCode`.
- The session property registration for `ADD_DISTINCT_BELOW_SEMI_JOIN_BUILD` was changed to use the four-argument `booleanProperty(..., false)` overload when adding `MERGE_MAX_BY_AGGREGATIONS`; double-check that the visibility/experimental flags for the existing property are intentional and unchanged in behavior.
## Individual Comments
### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/MergeMaxByAggregations.java:70-77` </location>
<code_context>
+public class MergeMaxByAggregations
+ implements Rule<AggregationNode>
+{
+ private static final String MAX_BY = "max_by";
+
+ private final FunctionResolution functionResolution;
+
+ private static final Pattern<AggregationNode> PATTERN = aggregation()
+ .matching(MergeMaxByAggregations::hasMultipleMaxBy);
+
+ private static boolean hasMultipleMaxBy(AggregationNode aggregationNode)
+ {
+ return aggregationNode.getAggregations().values().stream()
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Avoid relying on displayName and a hard-coded string to detect MAX_BY in the pattern predicate.
The pattern uses `agg.getCall().getDisplayName().equals(MAX_BY)` while `apply()` uses `functionResolution.isMaxByFunction(...)`. This mismatch can cause the pattern to fire differently than the logic in `apply()` (e.g., with case, qualification, or future renames). Please align the predicate with the function-resolution-based approach (e.g., use the function handle rather than the display name) so the match condition and core logic stay in sync.
Suggested implementation:
```java
private final FunctionResolution functionResolution;
private final Pattern<AggregationNode> pattern;
private boolean hasMultipleMaxBy(AggregationNode aggregationNode)
{
return aggregationNode.getAggregations().values().stream()
.filter(agg -> functionResolution.isMaxByFunction(agg.getCall().getFunctionHandle()))
.count() > 1;
}
public MergeMaxByAggregations(FunctionAndTypeManager functionAndTypeManager)
{
requireNonNull(functionAndTypeManager, "functionAndTypeManager is null");
this.functionResolution = new FunctionResolution(functionAndTypeManager.getFunctionAndTypeResolver());
this.pattern = aggregation()
.matching(this::hasMultipleMaxBy);
}
```
There is likely a `getPattern()` method later in the class that currently returns the static `PATTERN` field. That method should be updated to return the new instance field instead:
1. Change `return PATTERN;` to `return pattern;` in `getPattern()`.
2. Remove the now-unused static `PATTERN` field declaration if it appears elsewhere in the file.
</issue_to_address>
### Comment 2
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/MergeMaxByAggregations.java:255-257` </location>
<code_context>
+ private static class MaxByKey
+ {
+ private final RowExpression comparisonKey;
+ private final Optional<RowExpression> filter;
+ private final Optional<VariableReferenceExpression> mask;
+ private final boolean isDistinct;
+
+ public MaxByKey(Aggregation aggregation)
</code_context>
<issue_to_address>
**issue (bug_risk):** Consider including ORDER BY in MaxByKey to avoid merging aggregations with different ordering.
MaxByKey currently keys aggregations by comparisonKey, filter, mask, and distinct, but not `Aggregation.getOrderBy()`. The merged aggregation then uses `firstAggregation.getOrderBy()`. If two MAX_BY aggregations share those fields but differ in ORDER BY, they will be merged and all will effectively use the first aggregation’s ORDER BY, changing semantics. If this can occur, either add orderBy to MaxByKey’s key fields or explicitly reject non-empty/differing ORDER BY to avoid incorrect rewrites.
</issue_to_address>
### Comment 3
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/MergeMaxByAggregations.java:104-106` </location>
<code_context>
+ @Override
+ public Result apply(AggregationNode node, Captures captures, Context context)
+ {
+ Map<VariableReferenceExpression, Aggregation> maxByAggregations = node.getAggregations().entrySet().stream()
+ .filter(entry -> isMaxByAggregationWithTwoArgs(entry.getValue()))
+ .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
+
+ if (maxByAggregations.size() < 2) {
</code_context>
<issue_to_address>
**suggestion:** Use a map implementation that preserves insertion order explicitly instead of relying on Collectors.toMap defaults.
This stream currently collects into a default `HashMap` via `Collectors.toMap`, so the iteration order is unspecified. Since `groupedByKey` is later a `LinkedHashMap` to keep grouping order, consider collecting directly into a `LinkedHashMap` (e.g., `Collectors.toMap(..., LinkedHashMap::new, (a, b) -> a)`) or using `ImmutableMap.toImmutableMap` to ensure deterministic ordering and plan stability.
```suggestion
Map<VariableReferenceExpression, Aggregation> maxByAggregations = node.getAggregations().entrySet().stream()
.filter(entry -> isMaxByAggregationWithTwoArgs(entry.getValue()))
.collect(Collectors.toMap(
Map.Entry::getKey,
Map.Entry::getValue,
(first, second) -> first,
LinkedHashMap::new));
```
</issue_to_address>
### Comment 4
<location> `presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestMergeMaxByAggregations.java:42-51` </location>
<code_context>
+ .withAlias("max_v3", dereferenceInRange("max_by", 0, 2)));
+ }
+
+ @Test
+ public void testDoesNotFireWithSingleMaxBy()
+ {
+ tester().assertThat(new MergeMaxByAggregations(getMetadata().getFunctionAndTypeManager()))
+ .setSystemProperty(MERGE_MAX_BY_AGGREGATIONS, "true")
+ .on(p -> p.aggregation(af -> {
+ p.variable("v1", BIGINT);
+ p.variable("k", BIGINT);
+ af.globalGrouping()
+ .addAggregation(p.variable("max_v1"), p.rowExpression("max_by(v1, k)"))
+ .source(p.values(p.variable("v1"), p.variable("k")));
+ }))
+ .doesNotFire();
+ }
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add a positive-plan test for merging DISTINCT MAX_BY aggregations
You already have a negative test (`testDoesNotMergeMaxByWithDistinct`) for mixed DISTINCT/non-DISTINCT `MAX_BY`, which is good. Since the rule also groups by `isDistinct` in `MaxByKey`, please add a positive test where both `MAX_BY` aggregations are DISTINCT, share the same comparison key and filter, and are verified via `PlanMatchPattern` to merge into a single DISTINCT `max_by(ROW(...), key)` aggregation. This will help catch regressions where the DISTINCT flag is mishandled in the all-DISTINCT case.
Suggested implementation:
```java
import static com.facebook.presto.sql.planner.assertions.PlanMatchPattern.aggregation;
import static com.facebook.presto.sql.planner.assertions.PlanMatchPattern.column;
import static com.facebook.presto.sql.planner.assertions.PlanMatchPattern.globalAggregation;
import static com.facebook.presto.sql.planner.assertions.PlanMatchPattern.node;
import static com.facebook.presto.sql.planner.assertions.PlanMatchPattern.project;
import static com.facebook.presto.sql.planner.assertions.PlanMatchPattern.rowConstructor;
import static com.facebook.presto.sql.planner.assertions.PlanMatchPattern.singleGroupingSet;
import static com.facebook.presto.sql.planner.assertions.PlanMatchPattern.values;
```
```java
@Test
public void testDoesNotFireWithSingleMaxBy()
{
tester().assertThat(new MergeMaxByAggregations(getMetadata().getFunctionAndTypeManager()))
.setSystemProperty(MERGE_MAX_BY_AGGREGATIONS, "true")
.on(p -> p.aggregation(af -> {
p.variable("v1", BIGINT);
p.variable("k", BIGINT);
af.globalGrouping()
.addAggregation(p.variable("max_v1"), p.rowExpression("max_by(v1, k)"))
.source(p.values(p.variable("v1"), p.variable("k")));
}))
.doesNotFire();
}
@Test
public void testMergesDistinctMaxByWithSameKeyAndFilter()
{
tester().assertThat(new MergeMaxByAggregations(getMetadata().getFunctionAndTypeManager()))
.setSystemProperty(MERGE_MAX_BY_AGGREGATIONS, "true")
.on(p -> p.aggregation(af -> {
p.variable("v1", BIGINT);
p.variable("v2", BIGINT);
p.variable("k", BIGINT);
p.variable("f", BOOLEAN);
af.globalGrouping()
// both DISTINCT, same key and same filter
.addAggregation(p.variable("max_v1"), p.rowExpression("max_by(DISTINCT v1, k) FILTER (WHERE f)"))
.addAggregation(p.variable("max_v2"), p.rowExpression("max_by(DISTINCT v2, k) FILTER (WHERE f)"))
.source(p.values(p.variable("v1"), p.variable("v2"), p.variable("k"), p.variable("f")));
}))
.matches(
// Expect a single DISTINCT max_by over a ROW(v1, v2) and key k
project(
node(AggregationNode.class,
globalAggregation(
singleGroupingSet(),
ImmutableMap.of(
Optional.empty(),
aggregation(
"max_by",
ImmutableList.of(
rowConstructor(column("v1"), column("v2")),
column("k")))
.distinct()))))));
}
```
Depending on the exact PlanMatchPattern APIs and existing conventions in this test class, you might need to:
1. Adjust the `singleGroupingSet()` call in the matcher to explicitly specify the grouping key symbol (for example, `singleGroupingSet("k")` or `singleGroupingSet(column("k"))`) to match how other tests in this file describe grouping sets.
2. Confirm that `Optional.empty()` is an acceptable key in the `ImmutableMap` used by `globalAggregation`. If other tests key aggregations by symbol name (e.g., `Optional.of("max")`), align with that convention and use the appropriate alias.
3. If other tests use a different style for plan matching (for example, they use `aggregation(...)` directly inside `node(AggregationNode.class, ...)` without `globalAggregation(...)`), rewrite the `.matches(...)` section to mirror that pattern while still asserting:
- there is exactly one `max_by` aggregation,
- its first argument is `ROW(v1, v2)`,
- its second argument is `k`,
- and it is `.distinct()`.
These tweaks will make the new positive DISTINCT test consistent with the rest of the file and ensure it remains robust against minor plan-shape differences.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| private final Optional<RowExpression> filter; | ||
| private final Optional<VariableReferenceExpression> mask; | ||
| private final boolean isDistinct; |
There was a problem hiding this comment.
issue (bug_risk): Consider including ORDER BY in MaxByKey to avoid merging aggregations with different ordering.
MaxByKey currently keys aggregations by comparisonKey, filter, mask, and distinct, but not Aggregation.getOrderBy(). The merged aggregation then uses firstAggregation.getOrderBy(). If two MAX_BY aggregations share those fields but differ in ORDER BY, they will be merged and all will effectively use the first aggregation’s ORDER BY, changing semantics. If this can occur, either add orderBy to MaxByKey’s key fields or explicitly reject non-empty/differing ORDER BY to avoid incorrect rewrites.
a532ced to
332b90e
Compare
kaikalur
left a comment
There was a problem hiding this comment.
Also add MIN_BY in the same PR. They work the same way. And if they have the 3rd arg, make sure to merge only those with the same 3rd arg.
| Enable optimization to combine multiple :func:`!approx_distinct` function calls on expressions | ||
| of the same type into a single aggregation using ``set_agg`` with array operations (``array_constructor``, ``array_transpose``). | ||
|
|
||
| ``optimizer.merge_max_by_aggregations`` |
There was a problem hiding this comment.
Should also work the same way for min_by aggs. Change the option name to: ...max_by_and_min_by_...
332b90e to
62c1e4d
Compare
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pull branch, local doc build, looks good. Thank you!
…mbine multiple max_by with same second argument, controlled by session parameter
62c1e4d to
ab558fd
Compare
There was a problem hiding this comment.
You should not need this. Once you check the name, just the function is different. Everything else is same. So just reuse the same class and call it MinByAndMaxBy. Look at the other optimization rewrite_min_max_by_to_top_n for ideas on simplifying this code.
…comparison key (#27417) ## Description Adds a new `MergeMinMaxByAggregations` optimizer rule that merges multiple `max_by` or `min_by` aggregations sharing the same comparison key into a single aggregation with a `ROW` argument and `DEREFERENCE` projections. This is a takeover of #26873, addressing all reviewer feedback from @kaikalur and Sourcery. ### Key Changes - Unified `MergeMinMaxByAggregations` class handling both `max_by` and `min_by` (per kaikalur review) - Session property renamed to `merge_max_by_and_min_by_aggregations` (per kaikalur review) - `AggregationKey` includes `orderBy` to prevent merging aggregations with different ordering (per Sourcery review) - Instance-based pattern using `FunctionResolution` instead of string comparison (per Sourcery review) - `LinkedHashMap` for deterministic plan ordering (per Sourcery review) - 17 unit tests + 10 e2e integration tests ### Example ```sql -- Before optimization: SELECT max_by(v1, k), max_by(v2, k), max_by(v3, k) FROM t -- After optimization (internal rewrite): SELECT merged[0], merged[1], merged[2] FROM (SELECT max_by(ROW(v1, v2, v3), k) AS merged FROM t) ``` ## Motivation and Context When a query contains multiple `max_by`/`min_by` aggregations with the same comparison key, each independently scans and compares values. By merging into a single `max_by(ROW(...), key)` call: 1. **Reduced computation**: One comparison per row instead of N 2. **Better memory efficiency**: Single aggregation state instead of N separate states 3. **Simpler plan**: Fewer aggregation function calls ## Impact - New session property `merge_max_by_and_min_by_aggregations` (boolean, default `false`) - No breaking changes — optimization is disabled by default ## Test Plan - 17 unit tests in `TestMergeMinMaxByAggregations` covering positive rewrites (same key, GROUP BY, mixed aggs, multiple groups, both max_by+min_by, matching filters, different value types, DISTINCT) and negative cases (single agg, different keys, disabled, 3-arg, no max_by/min_by, different filters, mixed distinct) - 10 e2e integration tests in `AbstractTestQueries#testMergeMinMaxByAggregations` comparing results with optimization enabled vs disabled - All 523 TestLocalQueries pass ## Contributor checklist - [x] Complies with contributing guide - [x] PR description addresses the issue accurately - [x] Documented new session property - [x] Adequate tests added - [x] CI passed locally ## Release Notes ``` == NO RELEASE NOTE == ``` ## Summary by Sourcery Introduce an optimizer rule to merge compatible max_by and min_by aggregations sharing the same comparison key into a single aggregation and wire it behind a configurable session property. New Features: - Add MergeMinMaxByAggregations rule to rewrite multiple max_by or min_by aggregations with the same comparison key into a single ROW-based aggregation. - Introduce the merge_max_by_and_min_by_aggregations session property and corresponding features configuration flag. Documentation: - Document the new merge_max_by_and_min_by_aggregations session property in the admin session properties documentation. Tests: - Add rule unit tests covering positive and negative cases for merging max_by/min_by aggregations, including filters, GROUP BY, DISTINCT, and mixed types. - Add integration tests ensuring queries produce identical results with the merge_max_by_and_min_by_aggregations optimization enabled or disabled.
|
Closing as abandoned — superseded by #27417 which was merged. |
…comparison key (prestodb#27417) ## Description Adds a new `MergeMinMaxByAggregations` optimizer rule that merges multiple `max_by` or `min_by` aggregations sharing the same comparison key into a single aggregation with a `ROW` argument and `DEREFERENCE` projections. This is a takeover of prestodb#26873, addressing all reviewer feedback from @kaikalur and Sourcery. ### Key Changes - Unified `MergeMinMaxByAggregations` class handling both `max_by` and `min_by` (per kaikalur review) - Session property renamed to `merge_max_by_and_min_by_aggregations` (per kaikalur review) - `AggregationKey` includes `orderBy` to prevent merging aggregations with different ordering (per Sourcery review) - Instance-based pattern using `FunctionResolution` instead of string comparison (per Sourcery review) - `LinkedHashMap` for deterministic plan ordering (per Sourcery review) - 17 unit tests + 10 e2e integration tests ### Example ```sql -- Before optimization: SELECT max_by(v1, k), max_by(v2, k), max_by(v3, k) FROM t -- After optimization (internal rewrite): SELECT merged[0], merged[1], merged[2] FROM (SELECT max_by(ROW(v1, v2, v3), k) AS merged FROM t) ``` ## Motivation and Context When a query contains multiple `max_by`/`min_by` aggregations with the same comparison key, each independently scans and compares values. By merging into a single `max_by(ROW(...), key)` call: 1. **Reduced computation**: One comparison per row instead of N 2. **Better memory efficiency**: Single aggregation state instead of N separate states 3. **Simpler plan**: Fewer aggregation function calls ## Impact - New session property `merge_max_by_and_min_by_aggregations` (boolean, default `false`) - No breaking changes — optimization is disabled by default ## Test Plan - 17 unit tests in `TestMergeMinMaxByAggregations` covering positive rewrites (same key, GROUP BY, mixed aggs, multiple groups, both max_by+min_by, matching filters, different value types, DISTINCT) and negative cases (single agg, different keys, disabled, 3-arg, no max_by/min_by, different filters, mixed distinct) - 10 e2e integration tests in `AbstractTestQueries#testMergeMinMaxByAggregations` comparing results with optimization enabled vs disabled - All 523 TestLocalQueries pass ## Contributor checklist - [x] Complies with contributing guide - [x] PR description addresses the issue accurately - [x] Documented new session property - [x] Adequate tests added - [x] CI passed locally ## Release Notes ``` == NO RELEASE NOTE == ``` ## Summary by Sourcery Introduce an optimizer rule to merge compatible max_by and min_by aggregations sharing the same comparison key into a single aggregation and wire it behind a configurable session property. New Features: - Add MergeMinMaxByAggregations rule to rewrite multiple max_by or min_by aggregations with the same comparison key into a single ROW-based aggregation. - Introduce the merge_max_by_and_min_by_aggregations session property and corresponding features configuration flag. Documentation: - Document the new merge_max_by_and_min_by_aggregations session property in the admin session properties documentation. Tests: - Add rule unit tests covering positive and negative cases for merging max_by/min_by aggregations, including filters, GROUP BY, DISTINCT, and mixed types. - Add integration tests ensuring queries produce identical results with the merge_max_by_and_min_by_aggregations optimization enabled or disabled.
Description
This PR introduces a new Presto optimizer rule MergeMaxByAggregations that merges multiple max_by aggregations sharing the same comparison key into a single max_by aggregation with a ROW argument.
Key Changes
Motivation and Context
When a query contains multiple max_by aggregations with the same comparison key, each aggregation independently scans and compares values. By merging them into a single max_by(ROW(...), key) call, we:
Impact
Test Plan
Added comprehensive unit tests in TestMergeMaxByAggregations and e2e tests in AbstractTestQueries
Contributor checklist
Release Notes