-
Notifications
You must be signed in to change notification settings - Fork 4.1k
opt: incorrect results due to cross-joining input of semi-join #78681
Description
GenerateLookupJoins can cross-join the input of the join with a set of constant values, from a CHECK constraint, in order to constrain index prefix columns. This enables lookup joins to be generated in more cases. It is valid for inner joins, but it is not valid for left, anti, and semi joins. We currently disable this only for left and anti-joins, but not semi-joins. This incorrect transformation can cause incorrect query results.
Example:
CREATE TABLE a (
a INT NOT NULL
);
CREATE TABLE bc (
b INT NOT NULL,
c INT NOT NULL,
CHECK (b IN (1, 2, 3)),
INDEX (b, c)
);
# Insert stats so that a lookup semi-join is selected.
statement ok
ALTER TABLE a INJECT STATISTICS '[
{
"columns": ["a"],
"created_at": "2018-05-01 1:00:00.00000+00:00",
"row_count": 10,
"distinct_count": 10
}
]';
ALTER TABLE bc INJECT STATISTICS '[
{
"columns": ["b"],
"created_at": "2018-05-01 1:00:00.00000+00:00",
"row_count": 10000000,
"distinct_count": 3
}
]'
INSERT INTO a VALUES (1);
INSERT INTO bc VALUES (1, 1), (2, 1), (3, 1);
SELECT a FROM a WHERE EXISTS (SELECT * FROM bc WHERE a = c);
----
1
1
1The query plan reveals the problem. Notice how the input to the semi-join is cross-joined with all the possible values of b. This incorrectly increases the number of rows output by the join.
EXPLAIN (OPT, VERBOSE) SELECT a FROM a WHERE EXISTS (SELECT * FROM bc WHERE a =
c);
----
semi-join (lookup bc@bc_b_c_idx)
├── columns: a:1
├── key columns: [11 1] = [5 6]
├── stats: [rows=10, distinct(1)=10, null(1)=0, avgsize(1)=4]
├── cost: 844.7775
├── distribution: test
├── inner-join (cross)
│ ├── columns: a:1 "lookup_join_const_col_@5":11
│ ├── multiplicity: left-rows(one-or-more), right-rows(zero-or-more)
│ ├── stats: [rows=30, distinct(1)=10, null(1)=0, avgsize(1)=4, distinct(11)=3, null(11)=0, avgsize(11)=4]
│ ├── cost: 35.7575
│ ├── distribution: test
│ ├── scan a
│ │ ├── columns: a:1
│ │ ├── stats: [rows=10, distinct(1)=10, null(1)=0, avgsize(1)=4]
│ │ ├── cost: 35.22
│ │ ├── distribution: test
│ │ ├── prune: (1)
│ │ └── unfiltered-cols: (1-4)
│ ├── values
│ │ ├── columns: "lookup_join_const_col_@5":11
│ │ ├── cardinality: [3 - 3]
│ │ ├── stats: [rows=3, distinct(11)=3, null(11)=0, avgsize(11)=4]
│ │ ├── cost: 0.04
│ │ ├── distribution: test
│ │ ├── (1,)
│ │ ├── (2,)
│ │ └── (3,)
│ └── filters (true)
└── filters (true)
Version Affected
This bug has been present since version 21.1.0 and affects all later releases.
Workaround
The best known workaround is 1) determine the prefix columns in the lookup join for which the cross join is created (b in the example above) and 2) make that column nullable or remove the CHECK constraint. This will prevent the incorrect lookup join from being generated.
Jira issue: CRDB-14215
gz#11864
Metadata
Metadata
Assignees
Labels
Type
Projects
Status