opt: synthesize lookup join equalities for computed columns#76817
opt: synthesize lookup join equalities for computed columns#76817craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
5d6cea9 to
7937e65
Compare
RaduBerinde
left a comment
There was a problem hiding this comment.
Beautiful! It's awesome that we can get such a significant improvement with relatively little new code.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @mgartner, and @rytaft)
pkg/sql/opt/xform/join_funcs.go, line 421 at r1 (raw file):
// synthesized for it, we can project a column from the join's input // that can be used as a key column. We create the projection here, // and construct a Project expression that wrap's the join's input
[nit] wraps
pkg/sql/opt/xform/join_funcs.go, line 543 at r1 (raw file):
} // Wrap the input in a Project if any projections are required.
[nit] Maybe mentions that the LookupJoin will then essentially project away these synthesized columns
pkg/sql/opt/xform/join_funcs.go, line 1238 at r1 (raw file):
// is possible when: // // 1. col is non-nullable.
Possible TODO for the future: for (1) it would also be sufficient to prove that the expression itself can't evaluate to NULL, when the input props are taken in consideration. E.g. maybe x+1 is not nullable in the descriptor, butif in our input x is already not-NULL.
Actually, we can push this reasoning further: if all the outer columns have equality constraints, we already know they can't be NULL.. So any expression that can't be NULL on non-NULL outer columns should be fine (true of most expressions).
chengxiong-ruan
left a comment
There was a problem hiding this comment.
Thanks for the fix!
Looks like this won't fix the issue with partitioned hash index since the shard column is not 1st column?
Reviewable status:
complete! 2 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
rytaft
left a comment
There was a problem hiding this comment.
Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status:complete! 3 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/opt/table_meta.go, line 303 at r1 (raw file):
} e, ok := tm.ComputedCols[id] return e, ok
nit: can you just do return tm.ComputedCols[id]? Or does that not work
pkg/ccl/logictestccl/testdata/logic_test/multi_region_query_behavior, line 223 at r1 (raw file):
│ └── • lookup join (anti) │ │ table: customer@primary │ │ equality: (crdb_region_eq, h_c_w_id, h_c_d_id, h_c_id) = (crdb_region,c_w_id,c_d_id,c_id)
It seems like this is removing the locality optimized anti join. But maybe that's fine since we shouldn't need locality optimized search for cases when crdb_region is computed?
pkg/sql/opt/xform/testdata/rules/join, line 2782 at r1 (raw file):
---- # Generate an lookup join by synthesizing an equality constraint for an indexed
nit: an lookup -> a lookup
7937e65 to
1ab83c6
Compare
mgartner
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 3 of 0 LGTMs obtained (waiting on @rytaft)
pkg/sql/opt/table_meta.go, line 303 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: can you just do
return tm.ComputedCols[id]? Or does that not work
That's not allowed in Go unfortunately: https://go.dev/play/p/H0W5CTIM469
pkg/sql/opt/xform/join_funcs.go, line 421 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] wraps
Done.
pkg/sql/opt/xform/join_funcs.go, line 543 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] Maybe mentions that the LookupJoin will then essentially project away these synthesized columns
Done.
pkg/sql/opt/xform/join_funcs.go, line 1238 at r1 (raw file):
Previously, RaduBerinde wrote…
Possible TODO for the future: for (1) it would also be sufficient to prove that the expression itself can't evaluate to NULL, when the input props are taken in consideration. E.g. maybe
x+1is not nullable in the descriptor, butif in our inputxis already not-NULL.Actually, we can push this reasoning further: if all the outer columns have equality constraints, we already know they can't be NULL.. So any expression that can't be NULL on non-NULL outer columns should be fine (true of most expressions).
Great point. Should be pretty easy to do with memo.ExprIsNeverNull. I'll probably have a follow-up PR shortly that includes this. I've add a TODO for now.
pkg/ccl/logictestccl/testdata/logic_test/multi_region_query_behavior, line 223 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
It seems like this is removing the locality optimized anti join. But maybe that's fine since we shouldn't need locality optimized search for cases when
crdb_regionis computed?
Exactly. This plan is better now because we only need to search one region when the region is a function of the values we are inserting. But I'm glad you pointed this out, because it looks like a regression test specifically for costing LOS anti-lookup joins. I've changed the region columns from computed columns to DEFAULT gatewat_region() so that the LOS remains in the plan, and left a note mentioning this difference from the real TPCC schema.
pkg/sql/opt/xform/testdata/rules/join, line 2782 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: an lookup -> a lookup
Done.
1ab83c6 to
fac7bd7
Compare
rytaft
left a comment
There was a problem hiding this comment.
Reviewed 3 of 3 files at r2, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner)
pkg/sql/opt/table_meta.go, line 303 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
That's not allowed in Go unfortunately: https://go.dev/play/p/H0W5CTIM469
Thanks for the link!
pkg/ccl/logictestccl/testdata/logic_test/multi_region_query_behavior, line 223 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Exactly. This plan is better now because we only need to search one region when the region is a function of the values we are inserting. But I'm glad you pointed this out, because it looks like a regression test specifically for costing LOS anti-lookup joins. I've changed the region columns from computed columns to
DEFAULT gatewat_region()so that the LOS remains in the plan, and left a note mentioning this difference from the real TPCC schema.
Seems useful to also have some tests with the better plan for computed columns, though. Want to maybe add back at least one of those somewhere?
fac7bd7 to
7b68674
Compare
mgartner
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @rytaft)
pkg/ccl/logictestccl/testdata/logic_test/multi_region_query_behavior, line 223 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Seems useful to also have some tests with the better plan for computed columns, though. Want to maybe add back at least one of those somewhere?
Good idea. I added a new test in this file.
rytaft
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status:complete! 2 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)
Transformation rules that generate lookup joins can now synthesize
equality constraints for indexed computed columns. This allows lookup
joins to be planned in more cases.
For example, consider the tables and query:
CREATE TABLE a (
a INT
)
CREATE TABLE bc (
b INT,
c INT NOT NULL AS (b + 1) STORED,
INDEX c_b_idx (c, b)
)
SELECT * FROM a JOIN b ON a = b
We can add an equality constraint for `c` because `c` is a function of
`b` and `b` has an equality constraint in the join predicate:
SELECT * FROM a JOIN b ON a = b AND a + 1 = c
With the addition of the computed column equality, a lookup join
utilizing `c_b_idx` can be planned.
This is particularly important for improving plans for queries on tables
with hash-sharded indexes, because the shard column is a computed column
and it is unlikely to be constrained explicitly.
Release note (performance improvement): The optimizer attempts to plan
lookup joins on indexes that include computed columns in more cases,
which may improve query plans.
7b68674 to
58e5a96
Compare
|
bors r+ |
|
Build failed (retrying...): |
|
Build succeeded: |
Transformation rules that generate lookup joins can now synthesize
equality constraints for indexed computed columns. This allows lookup
joins to be planned in more cases.
For example, consider the tables and query:
We can add an equality constraint for
cbecausecis a function ofbandbhas an equality constraint in the join predicate:With the addition of the computed column equality, a lookup join
utilizing
c_b_idxcan be planned.This is particularly important for improving plans for queries on tables
with hash-sharded indexes, because the shard column is a computed column
and it is unlikely to be constrained explicitly.
Release note (performance improvement): The optimizer attempts to plan
lookup joins on indexes that include computed columns in more cases,
which may improve query plans.