release-22.2: sql: fix left semi and left anti virtual lookup joins#92880
Conversation
|
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
rytaft
left a comment
There was a problem hiding this comment.
I added a couple of questions / suggestions for tests that you could consider adding in a follow-up PR (since this is a backport).
Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @msirek and @yuzefovich)
pkg/sql/virtual_table.go line 342 at r1 (raw file):
return true, nil } case descpb.LeftSemiJoin:
Do you have any tests that exercise this case for semi join? (e.g., with your previous fix before the collab session, maybe a semi join would have returned multiple results?)
pkg/sql/logictest/testdata/logic_test/pg_catalog line 4693 at r1 (raw file):
AND t.relname = 't91012' AND a.atttypid IN (SELECT oid FROM pg_type WHERE typname = ANY (ARRAY['int8']));
could we construct a similar case for anti join?
yuzefovich
left a comment
There was a problem hiding this comment.
Thanks! I opened up #92937 and will squash it into the backports.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @msirek and @rytaft)
pkg/sql/virtual_table.go line 342 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Do you have any tests that exercise this case for semi join? (e.g., with your previous fix before the collab session, maybe a semi join would have returned multiple results?)
I doubt it. I spent some time trying to come up with cases where multiple matches are looked up in the virtual index but didn't succeed (the difficulty is that we have a few virtual indexes in general and most of them appear unique, then for those that aren't unique (like crdb_internal.jobs (status) and crdb_internal.cluster_database_privileges (database_name)) I couldn't come up with a query that would get left semi join).
pkg/sql/logictest/testdata/logic_test/pg_catalog line 4693 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
could we construct a similar case for anti join?
Done in #92937.
rytaft
left a comment
There was a problem hiding this comment.
Thanks!
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @msirek)
pkg/sql/virtual_table.go line 342 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I doubt it. I spent some time trying to come up with cases where multiple matches are looked up in the virtual index but didn't succeed (the difficulty is that we have a few virtual indexes in general and most of them appear unique, then for those that aren't unique (like
crdb_internal.jobs (status)andcrdb_internal.cluster_database_privileges (database_name)) I couldn't come up with a query that would get left semi join).
Ok no worries
This commit fixes the execution of the left semi and left anti virtual lookup joins. The bug was that we forgot to project away the looked up columns (coming from the "right" side) which could then lead to wrong columns being used higher up the tree. The bug was introduced during 22.1 release cycle where we added the optimizer support for generating plans that could contain left semi and left anti virtual lookup joins. This commit fixes that issue as well as the output columns of such joins (I'm not sure whether there is a user facing impact of having incorrect "output columns"). Additionally, this commit fixes the execution of these virtual lookup joins to correctly return the input row only once. Previously, for left anti joins we'd be producing an output row if there was a match (which is wrong), and for both left semi and left anti we would emit an output row every time there was a match (but this should be done only once). (Although I'm not sure whether it is possible for virtual indexes to result in multiple looked up rows.) Also, as a minor simplification this commit makes it so that the output rows are not added into the row container for left semi and left anti and the container is not instantiated at all. Release note (bug fix): CockroachDB previously could incorrectly evaluate queries that performed left semi and left anti "virtual lookup" joins on tables in `pg_catalog` or `information_schema`. These join types can be planned when a subquery is used inside of a filter condition. The bug was introduced in 22.1.0 and is now fixed.
8903ae2 to
878c29c
Compare
Backport 1/1 commits from #92713.
/cc @cockroachdb/release
This commit fixes the execution of the left semi and left anti virtual
lookup joins. The bug was that we forgot to project away the looked up
columns (coming from the "right" side) which could then lead to wrong
columns being used higher up the tree. The bug was introduced during
22.1 release cycle where we added the optimizer support for generating
plans that could contain left semi and left anti virtual lookup joins.
This commit fixes that issue as well as the output columns of such joins
(I'm not sure whether there is a user facing impact of having incorrect
"output columns").
Additionally, this commit fixes the execution of these virtual lookup
joins to correctly return the input row only once. Previously, for left
anti joins we'd be producing an output row if there was a match (which
is wrong), and for both left semi and left anti we would emit an output
row every time there was a match (but this should be done only once).
(Although I'm not sure whether it is possible for virtual indexes to
result in multiple looked up rows.)
Also, as a minor simplification this commit makes it so that the output
rows are not added into the row container for left semi and left anti
and the container is not instantiated at all.
Fixes: #91012.
Fixes: #88096.
Release note (bug fix): CockroachDB previously could incorrectly
evaluate queries that performed left semi and left anti "virtual lookup"
joins on tables in
pg_catalogorinformation_schema. These join typescan be planned when a subquery is used inside of a filter condition. The
bug was introduced in 22.1.0 and is now fixed.
Release justification: bug fix.