opt: infer lookup join equality conditions from unique key and foreign key

Consider the following example, using `./cockroach demo --global --empty --nodes=9`:
```sql
CREATE DATABASE db PRIMARY REGION "us-east1" REGIONS "us-west1", "europe-west1";
USE db;

CREATE TABLE accounts (
    account_id  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name        STRING NOT NULL,
    crdb_region crdb_internal_region NOT NULL,
    UNIQUE INDEX (account_id, crdb_region) -- this shouldn't be needed, but is. See https://github.com/cockroachdb/cockroach/issues/64619.
) LOCALITY GLOBAL;

CREATE TABLE tweets (
    account_id UUID NOT NULL,
    tweet_id   UUID DEFAULT gen_random_uuid(),
    message    STRING NOT NULL,
    crdb_region crdb_internal_region NOT NULL,
    PRIMARY KEY (crdb_region, account_id, tweet_id),
    FOREIGN KEY (account_id, crdb_region) REFERENCES accounts (account_id, crdb_region) ON DELETE CASCADE ON UPDATE CASCADE
) LOCALITY REGIONAL BY ROW;

INSERT INTO accounts VALUES (DEFAULT, 'starburst', 'us-east1') RETURNING account_id;

INSERT INTO tweets VALUES (
    '6f781502-4936-43cc-b384-04e5cf292cc8',
    DEFAULT,
    'this is tweeet 1',
    (SELECT crdb_region FROM accounts WHERE account_id = '6f781502-4936-43cc-b384-04e5cf292cc8')
) RETURNING tweet_id;

INSERT INTO tweets VALUES (
    '6f781502-4936-43cc-b384-04e5cf292cc8',
    DEFAULT,
    'this is tweeet 2',
    (SELECT crdb_region FROM accounts WHERE account_id = '6f781502-4936-43cc-b384-04e5cf292cc8')
) RETURNING tweet_id;
```
Ideally, the following query would be able to use a lookup join and visit only a single region, but it currently uses a merge join:
```sql
EXPLAIN SELECT tweet_id, message
FROM accounts
JOIN tweets USING (account_id)
WHERE account_id = '6f781502-4936-43cc-b384-04e5cf292cc8';
                                                                                                                                                                         info
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  distribution: full
  vectorized: true

  • merge join
  │ estimated row count: 2
  │ equality: (account_id) = (account_id)
  │ left cols are key
  │
  ├── • scan
  │     estimated row count: 1 (100% of the table; stats collected 1 minute ago)
  │     table: accounts@accounts_account_id_crdb_region_key
  │     spans: [/'6f781502-4936-43cc-b384-04e5cf292cc8'/'europe-west1' - /'6f781502-4936-43cc-b384-04e5cf292cc8'/'us-west1']
  │
  └── • scan
        estimated row count: 2 (100% of the table; stats collected 1 minute ago)
        table: tweets@primary
        spans: [/'europe-west1'/'6f781502-4936-43cc-b384-04e5cf292cc8' - /'europe-west1'/'6f781502-4936-43cc-b384-04e5cf292cc8'] [/'us-east1'/'6f781502-4936-43cc-b384-04e5cf292cc8' - /'us-east1'/'6f781502-4936-43cc-b384-04e5cf292cc8'] [/'us-west1'/'6f781502-4936-43cc-b384-04e5cf292cc8' - /'us-west1'/'6f781502-4936-43cc-b384-04e5cf292cc8']
(17 rows)
```
We can force a lookup join, but it still requires visiting all regions:
```sql
EXPLAIN SELECT tweet_id, message
FROM accounts
INNER LOOKUP JOIN tweets USING (account_id)
WHERE account_id = '6f781502-4936-43cc-b384-04e5cf292cc8';
                                                             info
------------------------------------------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • lookup join
  │ estimated row count: 2
  │ table: tweets@primary
  │ lookup condition: (account_id = account_id) AND (crdb_region IN ('europe-west1', 'us-east1', 'us-west1'))
  │ pred: account_id = '6f781502-4936-43cc-b384-04e5cf292cc8'
  │
  └── • scan
        estimated row count: 1 (100% of the table; stats collected 2 minutes ago)
        table: accounts@accounts_account_id_crdb_region_key
        spans: [/'6f781502-4936-43cc-b384-04e5cf292cc8'/'europe-west1' - /'6f781502-4936-43cc-b384-04e5cf292cc8'/'us-west1']
(13 rows)
```
In order for the optimizer to plan a lookup join that visits a single region, it would need to infer that the join condition `USING (account_id)` is actually equivalent to `USING (account_id, crdb_region)`. This inference should be possible using the fact that `account_id` is the primary key of `accounts` as well as the fact that there is a foreign key in `tweets` that references `accounts (account_id, crdb_region)`. Including `crdb_region` would allow the optimizer to change the lookup condition from `(crdb_region IN ('europe-west1', 'us-east1', 'us-west1'))` to another equality condition `(crdb_region = crdb_region)`, and thus visit the single region that contains 'ab887bbc-1a83-4324-8998-d18ebe448fa7'. The resulting plan should look like the following:
```sql
EXPLAIN SELECT tweet_id, message
FROM accounts
JOIN tweets USING (crdb_region, account_id)
WHERE account_id = '6f781502-4936-43cc-b384-04e5cf292cc8';
                                                             info
------------------------------------------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • lookup join
  │ estimated row count: 2
  │ table: tweets@primary
  │ equality: (crdb_region, account_id) = (crdb_region,account_id)
  │ pred: account_id = '6f781502-4936-43cc-b384-04e5cf292cc8'
  │
  └── • scan
        estimated row count: 1 (100% of the table; stats collected 3 minutes ago)
        table: accounts@accounts_account_id_crdb_region_key
        spans: [/'6f781502-4936-43cc-b384-04e5cf292cc8'/'europe-west1' - /'6f781502-4936-43cc-b384-04e5cf292cc8'/'us-west1']
(13 rows)
```

cc @nvanbenschoten 

Epic CRDB-26292

Jira issue: CRDB-9686

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt: infer lookup join equality conditions from unique key and foreign key #69617

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

opt: infer lookup join equality conditions from unique key and foreign key #69617

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions