BUGFIX: DELIM_JOINS should reflect functionality of NULL filtering conditions in joins with DELIM_GETS#16910
Merged
Mytherin merged 8 commits intoduckdb:mainfrom May 12, 2025
Merged
Conversation
…y join as a join child
Contributor
Author
|
CC @lnkuiper |
lnkuiper
reviewed
Mar 31, 2025
Collaborator
lnkuiper
left a comment
There was a problem hiding this comment.
LGTM (fingers crossed for CI)
Collaborator
|
Looks like just a missing |
!= join condition!= join condition
…n type is not a mark join type
!= join condition!= join condition
!= join condition|
Thanks for your effort in fixing! |
lnkuiper
approved these changes
May 12, 2025
Collaborator
lnkuiper
left a comment
There was a problem hiding this comment.
Thanks for the changes, it looks much better now! Now we won't bail out of the optimization as often.
Collaborator
|
Thanks! |
krlmlr
added a commit
to duckdb/duckdb-r
that referenced
this pull request
May 18, 2025
BUGFIX: DELIM_JOINS should reflect functionality of NULL filtering conditions in joins with DELIM_GETS (duckdb/duckdb#16910)
krlmlr
added a commit
to duckdb/duckdb-r
that referenced
this pull request
May 18, 2025
BUGFIX: DELIM_JOINS should reflect functionality of NULL filtering conditions in joins with DELIM_GETS (duckdb/duckdb#16910)
krlmlr
added a commit
to duckdb/duckdb-r
that referenced
this pull request
May 19, 2025
BUGFIX: DELIM_JOINS should reflect functionality of NULL filtering conditions in joins with DELIM_GETS (duckdb/duckdb#16910)
krlmlr
added a commit
to duckdb/duckdb-r
that referenced
this pull request
May 19, 2025
BUGFIX: DELIM_JOINS should reflect functionality of NULL filtering conditions in joins with DELIM_GETS (duckdb/duckdb#16910)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fixes #16803
Take a look at the plan produced by the BUG
Normally, if you have a join condition that is not (COMPARE NOT DISTINCT FROM) with DELIM_GET, we can remove the DELIM_GET and push a
IS NOT NULLfilter on the non-DELIM_GET child (see deliminator.cpp::258).For all join conditions that filter nulls this makes sense. The only one where it is confusing is COMPARE DISTINCT FROM, but that is a non-null-filtering INEQUALITY join condition. For all inequality joins with DELIM Gets, there is some special logic, see Deliminator.cpp::RemoveInequalityJoinWithDelimGet. The special logic flips the DELIM_JOIN condition to act as the inequality join while also switching the column binding of the delim join conditio to bind to the non-DELIM_GET column of the inequality join.
So with the flipped DELIM_JOIN condition, and removal of the DELIM_GET, we have the plan
The DELIM_JOIN is now a comparison join, which checks if
c1 is distinct from c0(i.e the inequality join). Now it's a bit more clear what is missing. What if t1 has a NULL value? It could be distinct from c0 and satisfy the inequalityness that we are looking for. However, in our original query, the NULL value of c1 joined with c0 never propagates out of the RHS, so the c1 value should not propagate on the RHS either. Also, considering this is an ANTI JOIN, that means NULL is always distinct from values in t0 since there is aIS NOT NULLfilter.Another way to look at it is that the
!=comparison filters out all NULLS, we push a IS NOT NULL on the GET that is not the DELIM GET, but we should then also push aIS NOT NULLon the DELIM_GET, but we don't.The fix I've come up with is to change the
IS DISTINCT FROMto a!=to respect NULL values in this case. Also the other way around where aNOT DISTINCT FROMgenerated by aEXISTSsubquery with an=condition should be changed to a=condition.We don't do this switch for DELIM_JOINS with mark joins because if the condition evaluates to NULL, the MARK JOIN column that is output is NULL, when an EXISTS is always true or false (I think this is a separate bug though, and we should be able to apply the fix in this PR regardless of delim join type)
Also have discovered that just disabling the optimization when the JOIN with the DELIM_GET has a != condition will result in a regression on q21 of tpch. See #4950