Split a disjunction of equijoin predicates into a union of joins

**Is your feature request related to a problem? Please describe.**
Queries with ORed equality join predicates always use cross join, which can be slow.

**Describe the solution you'd like**
Teach the optimizer to split the join into multiple joins, one per ORed equality join predicate so each join can use a more performant join method.

**Describe alternatives you've considered**
None

**Additional context**
Here is an example test case (students signing up for classes with each signup request having a first choice and a second choice) :
```
CREATE TABLE classes (classID int, description varchar(50)
                      , primary key(classID)
);

CREATE TABLE classRequest (studentID int, firstChoiceClassID int, secondChoiceClassID int, primary key(studentID));
CREATE INDEX firstIdx ON classRequest(firstChoiceClassID) storing(secondChoiceClassID);
CREATE INDEX secondIdx ON classRequest(secondChoiceClassID) storing(firstChoiceClassID);

insert into classes select g, 'dummy description' FROM generate_series(1,1000) g(g);
insert into classRequest select g,cast(random()*900 as int)+1, cast(random()*100 as int)+1 FROM generate_series(1,100000) g(g);

analyze classes;
analyze classRequest;

-- Find the number of classes having more than 1000 signup requests where the class is the first or second choice. 
explain
SELECT count(*) from (
SELECT count(*) FROM classes, classRequest 
                             WHERE classRequest.firstChoiceClassID = classes.classID or
                                            classRequest.secondChoiceClassID = classes.classID
GROUP BY classID
HAVING count(*) > 1000);
                                                 info
------------------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • group (scalar)
  │ estimated row count: 1
  │
  └── • filter
      │ estimated row count: 333
      │ filter: count_rows > 1000
      │
      └── • group (hash)
          │ estimated row count: 1,000
          │ group by: classid
          │
          └── • cross join
              │ estimated row count: 33,333,333
              │ pred: (firstchoiceclassid = classid) OR (secondchoiceclassid = classid)
              │
              ├── • scan
              │     estimated row count: 100,000 (100% of the table; stats collected 12 minutes ago)
              │     table: classrequest@classrequest_pkey
              │     spans: FULL SCAN
              │
              └── • scan
                    estimated row count: 1,000 (100% of the table; stats collected 12 minutes ago)
                    table: classes@classes_pkey
                    spans: FULL SCAN

SELECT count(*) from (
SELECT count(*) FROM classes, classRequest 
                             WHERE classRequest.firstChoiceClassID = classes.classID or
                                            classRequest.secondChoiceClassID = classes.classID
GROUP BY classID
HAVING count(*) > 1000);
 count
---------
    100
(1 row)

Time: 1.216s

```
A faster query plan:
```
                                                   info
------------------------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • group (scalar)
  │ estimated row count: 1
  │
  └── • filter
      │ estimated row count: 333
      │ filter: count_rows > 1000
      │
      └── • group (streaming)
          │ estimated row count: 1,000
          │ group by: classid
          │ ordered: +classid
          │
          └── • distinct
              │ estimated row count: 200,000
              │ distinct on: classid, studentid
              │ order key: classid
              │
              └── • union all
                  │ estimated row count: 200,000
                  │
                  ├── • merge join
                  │   │ estimated row count: 100,000
                  │   │ equality: (firstchoiceclassid) = (classid)
                  │   │ right cols are key
                  │   │
                  │   ├── • scan
                  │   │     estimated row count: 100,000 (100% of the table; stats collected 1 minute ago)
                  │   │     table: classrequest@firstidx
                  │   │     spans: FULL SCAN
                  │   │
                  │   └── • scan
                  │         estimated row count: 1,000 (100% of the table; stats collected 1 minute ago)
                  │         table: classes@classes_pkey
                  │         spans: FULL SCAN
                  │
                  └── • merge join
                      │ estimated row count: 100,000
                      │ equality: (secondchoiceclassid) = (classid)
                      │ right cols are key
                      │
                      ├── • scan
                      │     estimated row count: 100,000 (100% of the table; stats collected 1 minute ago)
                      │     table: classrequest@secondidx
                      │     spans: FULL SCAN
                      │
                      └── • scan
                            estimated row count: 1,000 (100% of the table; stats collected 1 minute ago)
                            table: classes@classes_pkey
                            spans: FULL SCAN


SELECT count(*) FROM classes, classRequest WHERE classRequest.firstChoiceClassID = classes.classID or
                                          classRequest.secondChoiceClassID = classes.classID
group by classID
having count(*) > 1000);
  count
---------
    100
(1 row)


Time: 164ms

```



Jira issue: CRDB-12023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split a disjunction of equijoin predicates into a union of joins #74302

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Split a disjunction of equijoin predicates into a union of joins #74302

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions