Skip to content

opt: synthesize lookup join equalities for computed columns#76817

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
mgartner:infer-computed-column-equalities
Feb 23, 2022
Merged

opt: synthesize lookup join equalities for computed columns#76817
craig[bot] merged 1 commit intocockroachdb:masterfrom
mgartner:infer-computed-column-equalities

Conversation

@mgartner
Copy link
Copy Markdown
Contributor

@mgartner mgartner commented Feb 19, 2022

Transformation rules that generate lookup joins can now synthesize
equality constraints for indexed computed columns. This allows lookup
joins to be planned in more cases.

For example, consider the tables and query:

CREATE TABLE a (
  a INT
)

CREATE TABLE bc (
  b INT,
  c INT NOT NULL AS (b + 1) STORED,
  INDEX c_b_idx (c, b)
)

SELECT * FROM a JOIN b ON a = b

We can add an equality constraint for c because c is a function of
b and b has an equality constraint in the join predicate:

SELECT * FROM a JOIN b ON a = b AND a + 1 = c

With the addition of the computed column equality, a lookup join
utilizing c_b_idx can be planned.

This is particularly important for improving plans for queries on tables
with hash-sharded indexes, because the shard column is a computed column
and it is unlikely to be constrained explicitly.

Release note (performance improvement): The optimizer attempts to plan
lookup joins on indexes that include computed columns in more cases,
which may improve query plans.

@mgartner mgartner requested a review from a team as a code owner February 19, 2022 23:56
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@mgartner mgartner force-pushed the infer-computed-column-equalities branch 2 times, most recently from 5d6cea9 to 7937e65 Compare February 20, 2022 19:54
Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful! It's awesome that we can get such a significant improvement with relatively little new code. :lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @mgartner, and @rytaft)


pkg/sql/opt/xform/join_funcs.go, line 421 at r1 (raw file):

			// synthesized for it, we can project a column from the join's input
			// that can be used as a key column. We create the projection here,
			// and construct a Project expression that wrap's the join's input

[nit] wraps


pkg/sql/opt/xform/join_funcs.go, line 543 at r1 (raw file):

		}

		// Wrap the input in a Project if any projections are required.

[nit] Maybe mentions that the LookupJoin will then essentially project away these synthesized columns


pkg/sql/opt/xform/join_funcs.go, line 1238 at r1 (raw file):

// is possible when:
//
//   1. col is non-nullable.

Possible TODO for the future: for (1) it would also be sufficient to prove that the expression itself can't evaluate to NULL, when the input props are taken in consideration. E.g. maybe x+1 is not nullable in the descriptor, butif in our input x is already not-NULL.

Actually, we can push this reasoning further: if all the outer columns have equality constraints, we already know they can't be NULL.. So any expression that can't be NULL on non-NULL outer columns should be fine (true of most expressions).

Copy link
Copy Markdown
Contributor

@chengxiong-ruan chengxiong-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! :lgtm_strong:
Looks like this won't fix the issue with partitioned hash index since the shard column is not 1st column?

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)

Copy link
Copy Markdown
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: awesome!

Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: :shipit: complete! 3 of 0 LGTMs obtained (waiting on @mgartner)


pkg/sql/opt/table_meta.go, line 303 at r1 (raw file):

	}
	e, ok := tm.ComputedCols[id]
	return e, ok

nit: can you just do return tm.ComputedCols[id]? Or does that not work


pkg/ccl/logictestccl/testdata/logic_test/multi_region_query_behavior, line 223 at r1 (raw file):

│       └── • lookup join (anti)
│           │ table: customer@primary
│           │ equality: (crdb_region_eq, h_c_w_id, h_c_d_id, h_c_id) = (crdb_region,c_w_id,c_d_id,c_id)

It seems like this is removing the locality optimized anti join. But maybe that's fine since we shouldn't need locality optimized search for cases when crdb_region is computed?


pkg/sql/opt/xform/testdata/rules/join, line 2782 at r1 (raw file):

----

# Generate an lookup join by synthesizing an equality constraint for an indexed

nit: an lookup -> a lookup

@mgartner mgartner force-pushed the infer-computed-column-equalities branch from 7937e65 to 1ab83c6 Compare February 22, 2022 16:28
Copy link
Copy Markdown
Contributor Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 3 of 0 LGTMs obtained (waiting on @rytaft)


pkg/sql/opt/table_meta.go, line 303 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: can you just do return tm.ComputedCols[id]? Or does that not work

That's not allowed in Go unfortunately: https://go.dev/play/p/H0W5CTIM469


pkg/sql/opt/xform/join_funcs.go, line 421 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] wraps

Done.


pkg/sql/opt/xform/join_funcs.go, line 543 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] Maybe mentions that the LookupJoin will then essentially project away these synthesized columns

Done.


pkg/sql/opt/xform/join_funcs.go, line 1238 at r1 (raw file):

Previously, RaduBerinde wrote…

Possible TODO for the future: for (1) it would also be sufficient to prove that the expression itself can't evaluate to NULL, when the input props are taken in consideration. E.g. maybe x+1 is not nullable in the descriptor, butif in our input x is already not-NULL.

Actually, we can push this reasoning further: if all the outer columns have equality constraints, we already know they can't be NULL.. So any expression that can't be NULL on non-NULL outer columns should be fine (true of most expressions).

Great point. Should be pretty easy to do with memo.ExprIsNeverNull. I'll probably have a follow-up PR shortly that includes this. I've add a TODO for now.


pkg/ccl/logictestccl/testdata/logic_test/multi_region_query_behavior, line 223 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

It seems like this is removing the locality optimized anti join. But maybe that's fine since we shouldn't need locality optimized search for cases when crdb_region is computed?

Exactly. This plan is better now because we only need to search one region when the region is a function of the values we are inserting. But I'm glad you pointed this out, because it looks like a regression test specifically for costing LOS anti-lookup joins. I've changed the region columns from computed columns to DEFAULT gatewat_region() so that the LOS remains in the plan, and left a note mentioning this difference from the real TPCC schema.


pkg/sql/opt/xform/testdata/rules/join, line 2782 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: an lookup -> a lookup

Done.

@mgartner mgartner force-pushed the infer-computed-column-equalities branch from 1ab83c6 to fac7bd7 Compare February 22, 2022 16:30
Copy link
Copy Markdown
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner)


pkg/sql/opt/table_meta.go, line 303 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

That's not allowed in Go unfortunately: https://go.dev/play/p/H0W5CTIM469

Thanks for the link!


pkg/ccl/logictestccl/testdata/logic_test/multi_region_query_behavior, line 223 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Exactly. This plan is better now because we only need to search one region when the region is a function of the values we are inserting. But I'm glad you pointed this out, because it looks like a regression test specifically for costing LOS anti-lookup joins. I've changed the region columns from computed columns to DEFAULT gatewat_region() so that the LOS remains in the plan, and left a note mentioning this difference from the real TPCC schema.

Seems useful to also have some tests with the better plan for computed columns, though. Want to maybe add back at least one of those somewhere?

@mgartner mgartner force-pushed the infer-computed-column-equalities branch from fac7bd7 to 7b68674 Compare February 22, 2022 21:35
Copy link
Copy Markdown
Contributor Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @rytaft)


pkg/ccl/logictestccl/testdata/logic_test/multi_region_query_behavior, line 223 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Seems useful to also have some tests with the better plan for computed columns, though. Want to maybe add back at least one of those somewhere?

Good idea. I added a new test in this file.

Copy link
Copy Markdown
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)

Transformation rules that generate lookup joins can now synthesize
equality constraints for indexed computed columns. This allows lookup
joins to be planned in more cases.

For example, consider the tables and query:

    CREATE TABLE a (
      a INT
    )

    CREATE TABLE bc (
      b INT,
      c INT NOT NULL AS (b + 1) STORED,
      INDEX c_b_idx (c, b)
    )

    SELECT * FROM a JOIN b ON a = b

We can add an equality constraint for `c` because `c` is a function of
`b` and `b` has an equality constraint in the join predicate:

    SELECT * FROM a JOIN b ON a = b AND a + 1 = c

With the addition of the computed column equality, a lookup join
utilizing `c_b_idx` can be planned.

This is particularly important for improving plans for queries on tables
with hash-sharded indexes, because the shard column is a computed column
and it is unlikely to be constrained explicitly.

Release note (performance improvement): The optimizer attempts to plan
lookup joins on indexes that include computed columns in more cases,
which may improve query plans.
@mgartner mgartner force-pushed the infer-computed-column-equalities branch from 7b68674 to 58e5a96 Compare February 23, 2022 16:47
@mgartner
Copy link
Copy Markdown
Contributor Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Feb 23, 2022

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Feb 23, 2022

Build succeeded:

@craig craig bot merged commit ad0cdd0 into cockroachdb:master Feb 23, 2022
@mgartner mgartner deleted the infer-computed-column-equalities branch February 24, 2022 00:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants