Support native non-equal lookup join planning#25519
Conversation
ea37779 to
600e5ce
Compare
| return; | ||
| } | ||
|
|
||
| if (callExpression.getDisplayName().equalsIgnoreCase("CONTAINS") |
There was a problem hiding this comment.
Check FunctionResolution.java, we have util functions for this check
There was a problem hiding this comment.
Any pointers? Couldn't find it.
| if (callExpression.getDisplayName().equalsIgnoreCase("BETWEEN") | ||
| && callExpression.getArguments().size() == 3 | ||
| && (callExpression.getArguments().get(0) instanceof VariableReferenceExpression)) { | ||
| context.getLookupVariables().add((VariableReferenceExpression) callExpression.getArguments().get(0)); |
There was a problem hiding this comment.
Do you need the range to be constant or not? For example, a between 1 and 10? Similar for the contains below
There was a problem hiding this comment.
It doesn't have to be constant since this util class is used for both constant and non-constant join condition.
| if (expression instanceof SpecialFormExpression && ((SpecialFormExpression) expression).getForm() == SpecialFormExpression.Form.AND) { | ||
| for (RowExpression operand : ((SpecialFormExpression) expression).getArguments()) { | ||
| extractFromFilter(operand, context); | ||
| if (!context.isEligible()) { |
There was a problem hiding this comment.
Is this exist too early? For example, a > 3 and b between 4 and 10, it will exit when seeing a>3 and not extracting the b between 4 and 10 part?
There was a problem hiding this comment.
Maybe check extractConjuncts function in LogicalRowExpressions.java, get all the conjuncts, and get the between and contains expressions?
There was a problem hiding this comment.
Thanks for the pointer!
Is this exist too early?
The idea is as soon as any unsupported expression is detected, we mark the plan as ineligible for index join (currently we only support =, BETWEEN and CONTAINS)
| if (node.getFilter().isPresent()) { | ||
| LookupVariableExtractor.Context commonExtractorContext = new LookupVariableExtractor.Context(new HashSet<>()); | ||
| LookupVariableExtractor.extractFromFilter(node.getFilter().get(), commonExtractorContext); | ||
| if (commonExtractorContext.isEligible()) { |
There was a problem hiding this comment.
This check will be false even if part of the expression is eligible? I mean 'a > 10 and b between 1 and 10'?
There was a problem hiding this comment.
that's correct. a > 10 is not supported so the whole plan is not eligible for index join
Currently c++ index join is only implemented by an internal connector where we have some unit tests covered there. |
600e5ce to
28376e2
Compare
|
How does our contains optimization (that just unnests) interop with this? Do we have performance benchmarks for the two? |
28376e2 to
3128bfb
Compare
Any pointers? |
Look at optimizers like: CrossJoinWithArrayContainsToInnerJoin |
|
Also @feilong-liu - we should rewrite reasonable (like 1000?) BETWEEN range for int/long to array contains and let the other optimizations kick in |
| probeHashVariable, | ||
| indexHashVariable); | ||
| indexHashVariable, | ||
| lookupVariables); |
There was a problem hiding this comment.
What's the lookup variables? Are those used for filter pushdown and non-equal join conditions? Thanks!
There was a problem hiding this comment.
For query:
SELECT
*
FROM t1
JOIN t2
ON t1.c1 = t2.k1
AND t2.k2 BETWEEN t1.c2 AND 999
AND CONTAINS(ARRAY[1, 2, 3], t2.k3)k1, k2, k3 are the lookup variables. We store them inside the node itself so we don't have to traverse the plan again to extract them when needed.
| } | ||
|
|
||
| // Extract equal Join keys. | ||
| List<VariableReferenceExpression> leftEqualJoinVariables; |
There was a problem hiding this comment.
We don't need to declare leftEqualJoinVariables/rightEqualJoinVariables here?
if (!node.getCriteria().isEmpty()) {
leftEqualJoinVariables = node.getCriteria().stream().map(EquiJoinClause::getLeft).collect(toImmutableList());
rightEqualJoinVariables = node.getCriteria().stream().map(EquiJoinClause::getRight).collect(toImmutableList());
leftLookupVariables.addAll(leftEqualJoinVariables);
rightLookupVariables.addAll(rightEqualJoinVariables);
}
There was a problem hiding this comment.
It's needed here. They might be used later by createEquiJoinClause().
|
@zacw7 : Thanks for this code change. There are several failures in the tpc-ds query suite in this PR. Please take a look. |
|
5e666ec to
f97e8df
Compare
Thanks for the suggestion. I've added |
This change adds an extractor to traverse the Join plan and get lookup variables in different PlanNode, then stores the lookup variables in LookupJoinNode, which enables index lookup join with non-equal join condition for native execution. Additional changes are made to ensure lookup variables are not pruned by other optimizers.
f97e8df to
470783e
Compare
This change adds an extractor to traverse the Join plan and get lookup variables in different PlanNode, then stores the lookup variables in LookupJoinNode, which enables index lookup join with non-equal join condition for native execution. Additional changes are made to ensure lookup variables are not pruned by other optimizers.