Skip to content

opt: join reordering can produce incorrect query plans #76522

@mgartner

Description

@mgartner

Join reordering can produce query plans where filters in the original plan are missing, causing incorrect query results.

Below is a reproduction of the issue. Notice that the t2.should_not_be_eliminated = t5.should_not_be_eliminated filter is missing from the final query plan.

exec-ddl
CREATE TABLE t1 (
	a INT NOT NULL,
	b INT NOT NULL,
	PRIMARY KEY (a ASC, b ASC)
)
----

exec-ddl
CREATE TABLE t2 (
	a INT NOT NULL,
	c INT,
	should_not_be_eliminated INT,
	PRIMARY KEY (a ASC)
);
----

exec-ddl
CREATE TABLE t3 (
	a INT NOT NULL,
	d INT NOT NULL,
	PRIMARY KEY (a ASC, d ASC)
);
----

exec-ddl
CREATE TABLE t4 (
	e INT NOT NULL,
	f INT,
	g INT,
	PRIMARY KEY (e ASC)
);
----

exec-ddl
CREATE TABLE t5 (
	h INT NOT NULL,
	f INT NOT NULL,
	g INT NOT NULL,
	b INT,
	should_not_be_eliminated INT,
	c INT,
	PRIMARY KEY (h ASC, f ASC, g ASC)
);
----

# Give t1 many rows where a has many distincts.
exec-ddl
ALTER TABLE t1 INJECT STATISTICS '[
    {
        "columns": [
            "a"
        ],
        "created_at": "2022-01-17 12:51:38.433911",
        "distinct_count": 9161427,
        "null_count": 0,
        "row_count": 44484238
    }
]';
----

# Give t2 many rows where a has many distincts.
exec-ddl
ALTER TABLE t2 INJECT STATISTICS '[
    {
        "columns": [
            "a"
        ],
        "created_at": "2022-01-17 12:51:38.433911",
        "distinct_count": 17014025,
        "null_count": 0,
        "row_count": 17024553
    }
]';
----

# Give t3 many rows where a has many distincts.
exec-ddl
ALTER TABLE t3 INJECT STATISTICS '[
    {
        "columns": [
            "a"
        ],
        "created_at": "2022-01-17 12:51:38.433911",
        "distinct_count": 17187349,
        "null_count": 0,
        "row_count": 18138540
    }
]';
----

# Give t4 many rows where e has many distincts.
exec-ddl
ALTER TABLE t4 INJECT STATISTICS '[
    {
        "columns": [
            "e"
        ],
        "created_at": "2022-01-17 12:51:38.433911",
        "distinct_count": 346919,
        "null_count": 0,
        "row_count": 346109
    }
]';
----

# Give t5 few rows.
exec-ddl
ALTER TABLE t5 INJECT STATISTICS '[
    {
        "columns": [
            "h"
        ],
        "created_at": "2022-01-17 12:51:38.433911",
        "distinct_count": 119,
        "null_count": 0,
        "row_count": 119
    }
]';
----

# Notice that some of the filters are missing entirely:
#
#   t5.c = t2.c
#   t2.should_not_be_eliminated = t5.should_not_be_eliminated
#
opt
SELECT
  t2.a
FROM
  t1
  INNER JOIN t2 ON t1.a = t2.a
  INNER JOIN t3 ON t1.a = t3.a
  INNER JOIN t4 ON t3.d = t4.e
  INNER JOIN t5 ON
      t4.f = t5.f
      AND t4.g = t5.g
      AND t5.b = t1.b
      AND t5.c = t2.c
WHERE
  t1.a = 123456 AND t2.should_not_be_eliminated = t5.should_not_be_eliminated;
----
project
 ├── columns: a:5!null
 └── inner-join (lookup t1)
      ├── columns: t1.a:1!null t1.b:2!null t2.a:5!null t2.c:6!null t2.should_not_be_eliminated:7!null t3.a:10!null d:11!null e:14!null t4.f:15!null t4.g:16!null t5.f:20!null t5.g:21!null t5.b:22!null t5.should_not_be_eliminated:23!null t5.c:24!null
      ├── key columns: [5 22] = [1 2]
      ├── lookup columns are key
      ├── inner-join (lookup t2)
      │    ├── columns: t2.a:5!null t2.c:6 t2.should_not_be_eliminated:7 t3.a:10!null d:11!null e:14!null t4.f:15!null t4.g:16!null t5.f:20!null t5.g:21!null t5.b:22 t5.should_not_be_eliminated:23 t5.c:24
      │    ├── key columns: [10] = [5]
      │    ├── lookup columns are key
      │    ├── inner-join (hash)
      │    │    ├── columns: t3.a:10!null d:11!null e:14!null t4.f:15!null t4.g:16!null t5.f:20!null t5.g:21!null t5.b:22 t5.should_not_be_eliminated:23 t5.c:24
      │    │    ├── scan t5
      │    │    │    └── columns: t5.f:20!null t5.g:21!null t5.b:22 t5.should_not_be_eliminated:23 t5.c:24
      │    │    ├── inner-join (lookup t4)
      │    │    │    ├── columns: t3.a:10!null d:11!null e:14!null t4.f:15 t4.g:16
      │    │    │    ├── key columns: [11] = [14]
      │    │    │    ├── lookup columns are key
      │    │    │    ├── scan t3
      │    │    │    │    ├── columns: t3.a:10!null d:11!null
      │    │    │    │    └── constraint: /10/11: [/123456 - /123456]
      │    │    │    └── filters (true)
      │    │    └── filters
      │    │         ├── t4.f:15 = t5.f:20
      │    │         └── t4.g:16 = t5.g:21
      │    └── filters
      │         └── t2.a:5 = 123456
      └── filters
           └── t1.a:1 = 123456

Metadata

Metadata

Assignees

Labels

C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.C-technical-advisoryCaused a technical advisoryS-0-visible-logical-errorDatabase stores inconsistent data in some cases, or queries return invalid results silently.T-sql-queriesSQL Queries Teambranch-release-20.2Used to mark GA and release blockers, technical advisories, and bugs for 20.2branch-release-21.1Used to mark GA and release blockers, technical advisories, and bugs for 21.1branch-release-21.2Used to mark GA and release blockers, technical advisories, and bugs for 21.2

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions