Skip to content

Support native non-equal lookup join planning#25519

Merged
zacw7 merged 1 commit into
prestodb:masterfrom
zacw7:non-equal-index-join
Jul 29, 2025
Merged

Support native non-equal lookup join planning#25519
zacw7 merged 1 commit into
prestodb:masterfrom
zacw7:non-equal-index-join

Conversation

@zacw7

@zacw7 zacw7 commented Jul 10, 2025

Copy link
Copy Markdown
Member

This change adds an extractor to traverse the Join plan and get lookup variables in different PlanNode, then stores the lookup variables in LookupJoinNode, which enables index lookup join with non-equal join condition for native execution. Additional changes are made to ensure lookup variables are not pruned by other optimizers.

== NO RELEASE NOTE ==

@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Jul 10, 2025
@zacw7 zacw7 force-pushed the non-equal-index-join branch from ea37779 to 600e5ce Compare July 10, 2025 05:53
@zacw7 zacw7 marked this pull request as ready for review July 10, 2025 16:57
@zacw7 zacw7 requested a review from a team as a code owner July 10, 2025 16:57
@zacw7 zacw7 requested review from feilong-liu and xiaoxmeng July 10, 2025 16:57

@feilong-liu feilong-liu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a plan test related to the changes

return;
}

if (callExpression.getDisplayName().equalsIgnoreCase("CONTAINS")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check FunctionResolution.java, we have util functions for this check

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any pointers? Couldn't find it.

if (callExpression.getDisplayName().equalsIgnoreCase("BETWEEN")
&& callExpression.getArguments().size() == 3
&& (callExpression.getArguments().get(0) instanceof VariableReferenceExpression)) {
context.getLookupVariables().add((VariableReferenceExpression) callExpression.getArguments().get(0));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need the range to be constant or not? For example, a between 1 and 10? Similar for the contains below

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't have to be constant since this util class is used for both constant and non-constant join condition.

if (expression instanceof SpecialFormExpression && ((SpecialFormExpression) expression).getForm() == SpecialFormExpression.Form.AND) {
for (RowExpression operand : ((SpecialFormExpression) expression).getArguments()) {
extractFromFilter(operand, context);
if (!context.isEligible()) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this exist too early? For example, a > 3 and b between 4 and 10, it will exit when seeing a>3 and not extracting the b between 4 and 10 part?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe check extractConjuncts function in LogicalRowExpressions.java, get all the conjuncts, and get the between and contains expressions?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer!

Is this exist too early?

The idea is as soon as any unsupported expression is detected, we mark the plan as ineligible for index join (currently we only support =, BETWEEN and CONTAINS)

if (node.getFilter().isPresent()) {
LookupVariableExtractor.Context commonExtractorContext = new LookupVariableExtractor.Context(new HashSet<>());
LookupVariableExtractor.extractFromFilter(node.getFilter().get(), commonExtractorContext);
if (commonExtractorContext.isEligible()) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check will be false even if part of the expression is eligible? I mean 'a > 10 and b between 1 and 10'?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's correct. a > 10 is not supported so the whole plan is not eligible for index join

@zacw7

zacw7 commented Jul 14, 2025

Copy link
Copy Markdown
Member Author

Can you also add a plan test related to the changes

Currently c++ index join is only implemented by an internal connector where we have some unit tests covered there.

@zacw7 zacw7 requested a review from feilong-liu July 14, 2025 20:35
@zacw7 zacw7 force-pushed the non-equal-index-join branch from 600e5ce to 28376e2 Compare July 14, 2025 22:53
@kaikalur

Copy link
Copy Markdown
Contributor

How does our contains optimization (that just unnests) interop with this? Do we have performance benchmarks for the two?

@kaikalur kaikalur self-requested a review July 15, 2025 15:07
@zacw7 zacw7 force-pushed the non-equal-index-join branch from 28376e2 to 3128bfb Compare July 15, 2025 15:14
@zacw7

zacw7 commented Jul 15, 2025

Copy link
Copy Markdown
Member Author

How does our contains optimization (that just unnests) interop with this? Do we have performance benchmarks for the two?

Any pointers?

@kaikalur

Copy link
Copy Markdown
Contributor

How does our contains optimization (that just unnests) interop with this? Do we have performance benchmarks for the two?

Any pointers?

Look at optimizers like: CrossJoinWithArrayContainsToInnerJoin

@kaikalur

Copy link
Copy Markdown
Contributor

Also @feilong-liu - we should rewrite reasonable (like 1000?) BETWEEN range for int/long to array contains and let the other optimizations kick in

probeHashVariable,
indexHashVariable);
indexHashVariable,
lookupVariables);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the lookup variables? Are those used for filter pushdown and non-equal join conditions? Thanks!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For query:

SELECT
  *
FROM t1
JOIN t2
  ON t1.c1 = t2.k1
  AND t2.k2 BETWEEN t1.c2 AND 999
  AND CONTAINS(ARRAY[1, 2, 3], t2.k3)

k1, k2, k3 are the lookup variables. We store them inside the node itself so we don't have to traverse the plan again to extract them when needed.

}

// Extract equal Join keys.
List<VariableReferenceExpression> leftEqualJoinVariables;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to declare leftEqualJoinVariables/rightEqualJoinVariables here?

if (!node.getCriteria().isEmpty()) {
  leftEqualJoinVariables = node.getCriteria().stream().map(EquiJoinClause::getLeft).collect(toImmutableList());
  rightEqualJoinVariables = node.getCriteria().stream().map(EquiJoinClause::getRight).collect(toImmutableList());
  leftLookupVariables.addAll(leftEqualJoinVariables);
  rightLookupVariables.addAll(rightEqualJoinVariables);
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed here. They might be used later by createEquiJoinClause().

@aditi-pandit

Copy link
Copy Markdown
Contributor

@zacw7 : Thanks for this code change.

There are several failures in the tpc-ds query suite in this PR. Please take a look.

@rschlussel

Copy link
Copy Markdown
Contributor

Can you also add a plan test related to the changes

Currently c++ index join is only implemented by an internal connector where we have some unit tests covered there.

  1. this should have plan tests (you can look at how other optimizers are tested)
  2. You can add simple index join support to an open source connector for testing. For example, in Java we have an IndexedTpchConnector to test index queries even though none of the other included connectors support index join https://github.com/prestodb/presto/blob/master/presto-tests/src/main/java/com/facebook/presto/tests/tpch/IndexedTpchConnectorFactory.java. If there's open source functionality we depend on there need to be open source tests. Otherwise nobody will know if they break it until after a change is already merged.

@zacw7 zacw7 force-pushed the non-equal-index-join branch 3 times, most recently from 5e666ec to f97e8df Compare July 24, 2025 02:36
@zacw7

zacw7 commented Jul 24, 2025

Copy link
Copy Markdown
Member Author

Can you also add a plan test related to the changes

Currently c++ index join is only implemented by an internal connector where we have some unit tests covered there.

  1. this should have plan tests (you can look at how other optimizers are tested)
  2. You can add simple index join support to an open source connector for testing. For example, in Java we have an IndexedTpchConnector to test index queries even though none of the other included connectors support index join https://github.com/prestodb/presto/blob/master/presto-tests/src/main/java/com/facebook/presto/tests/tpch/IndexedTpchConnectorFactory.java. If there's open source functionality we depend on there need to be open source tests. Otherwise nobody will know if they break it until after a change is already merged.

Thanks for the suggestion. I've added TestNativeIndexJoinLogicalPlanner.

This change adds an extractor to traverse the Join plan and get lookup variables in different PlanNode, then stores the lookup variables in LookupJoinNode, which enables index lookup join with non-equal join condition for native execution. Additional changes are made to ensure lookup variables are not pruned by other optimizers.
@zacw7 zacw7 force-pushed the non-equal-index-join branch from f97e8df to 470783e Compare July 24, 2025 02:45
@feilong-liu feilong-liu requested a review from rschlussel July 25, 2025 04:26
@zacw7 zacw7 merged commit 45c3305 into prestodb:master Jul 29, 2025
108 checks passed
@zacw7 zacw7 deleted the non-equal-index-join branch July 29, 2025 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:Meta PR from Meta

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants