opt: add rule to efficiently enumerate join orderings#51180
opt: add rule to efficiently enumerate join orderings#51180craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
|
Note that the Q7 plan is slower for higher join limits - this is because of costing error. Also note that a join limit of 3 on master corresponds to the command |
|
Planning times for this worst-case query with different join limits for master vs rule: This is a worst-case query because ReorderJoins only enumerates orderings that do not introduce new cross joins. Because all base relations in this query are connected, all join orderings are considered worth adding to the memo, and so planning time grows very quickly with join tree size. As with my other comment, keep in mind that (for example) the '1 Join' column in the table corresponds to the command |
|
I'm thinking a default join limit of 8 joins is a good place to start, but other opinions are welcome. |
|
Next Steps:
|
|
@rytaft I'm seeing some unexpected differences in stats and costs even with the same plans, and even when those plans don't involve inner joins: |
|
This is cool! Looking forward to reviewing in detail. From an initial look, the change is surprisingly non-invasive! I think @justinj might be interested in looking at this too! Starting with a join limit of 8 for now is reasonable. I think we will probably want to make it much higher, and add a limit on the number of orderings that we examine (effectively putting a limit on the planning time). It would be a pity to not take advantage of the algorithm for a large number of joins where we can still efficiently enumerate all (non-cross-join inducing) orderings. The costs you asked about changed because you added some testcases for which you injected stats for those tables. Here's the diff for the plans: |
DrewKimball
left a comment
There was a problem hiding this comment.
Ah, thanks Radu. Should have thought of that.
Reviewable status:
complete! 0 of 0 LGTMs obtained
|
Very cool! I also didn't look in huge detail but very exciting! |
|
Does this PR improve the planning time for #43039? |
andy-kimball
left a comment
There was a problem hiding this comment.
, some of my feedback is already incorporated, I left comments for some other things I noticed. Biggest thing is that I think you should unit test the
joinOrderBuilder.go given it's a good-sized file with some tricky cases. Overall, it's great work and will be a big step forward for the optimizer on its path towards being best-in-industry. : )
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)
pkg/sql/opt/xform/join_order_builder.go, line 132 at r1 (raw file):
// // Citations: [8] type joinOrderBuilder struct {
I think it might make sense to unit test some of the methods of this struct in a join_order_builder_test.go file. For example, I could imagine a few tests that check that vertexes/edges/plans are built correctly for 2-3 join cases. It worries me that we have only higher-level tests that spew out so much output that it can be hard to detect subtle bugs in the logic.
pkg/sql/opt/xform/testdata/external/tpce-no-stats, line 66 at r1 (raw file):
│ │ │ ├── multiplicity: left-rows(zero-or-more), right-rows(zero-or-one) │ │ │ ├── fd: ()-->(1,2,5), (6)-->(9), (15)-->(20), (37)-->(39), (39)-->(37), (36)==(37), (37)==(36), (15)==(33), (33)==(15), (6)==(20), (20)==(6), (3)==(9), (9)==(3), (1)==(5), (5)==(1) │ │ │ ├── scan broker@secondary
You should look at each of @nvanbenschoten's hand-tuned queries and see how they compare with what the optimizer selects. Here is a link to one of the files: https://github.com/cockroachlabs/tpc-e/blob/f3f0f57c61ba7e324c0273acf573238e739bcfe2/tier-a/src/txn_harness/broker_volume.rs. @nvanbenschoten, what's best way to experiment with each of these queries in isolation, to see how well the optimizer's plans hold up vs. the hand-tuned versions?
pkg/sql/opt/xform/testdata/rules/join, line 467 at r1 (raw file):
# Case where an implicit filter is used in the final plan (abc.a = pqr.p in the # merge join) and equivalence closure is ensured for constant equalities (note
Update comment, since you're not putting constant equality support into this PR.
pkg/sql/opt/xform/testdata/rules/join, line 3726 at r1 (raw file):
SELECT * FROM (SELECT * FROM pqr ORDER BY q LIMIT 5) INNER JOIN abc ON a=r ---- inner-join (hash)
How come this one switched from lookup to hash? I'd think lookup is better in this case.
|
I see that several plans for TPCH queries have changed, and it might be interesting to see how the execution runtimes of those queries have changed. I've actually just merged a
|
|
Oh, I didn't notice the second table in the image above that contains the runtime comparison already. Nice speedup on Q2! |
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)
pkg/sql/opt/xform/rules/join.opt, line 11 at r1 (raw file):
# not reordered. For more information, see the comment in join_order_builder.go. # # TODO(drewk): Currently, ReorderJoins will match each join expression in a
This would be a significant optimization, right? We'll have to brainstorm a way to do it in the future. I don't see how without doing a special traversal of the expression before exploration starts.
It may be worthwhile to see how much difference it would make (you could hackShouldReorderJoins and hardcode a specific JoinSize for a test query, maybe the worst case query you mentioned)
I believe @DrewKimball has TPC-E 5k up and running on a roachprod cluster now, so the best way to experiment with the queries in isolation is probably just to run them with and without hand-tuning on that cluster while the load is running. You'll need to bump up the The three important queries to focus on in TPC-E are:
|
|
pkg/sql/opt/xform/rules/join.opt, line 11 at r1 (raw file): Previously, RaduBerinde wrote…
It looks like it shaves a couple 10's of ms off of planning time for that query with join limit = 8. That said, you're right that it's difficult to see a non-messy way to do this. I think the best solution is probably to simply implement a more efficient algorithm than DPSub, and see where that gets us. |
|
pkg/sql/opt/xform/testdata/rules/join, line 3726 at r1 (raw file): Previously, andy-kimball (Andy Kimball) wrote…
That's because the stats changed when I moved the old AssociateJoins tests to the top of the file. Table abc is altered to have only 100 rows in the GenerateInvertedIndexZigzagJoins tests. |
DrewKimball
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @DrewKimball, and @RaduBerinde)
pkg/sql/opt/xform/testdata/rules/join, line 467 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Update comment, since you're not putting constant equality support into this PR.
Done.
Is there a reason why it has to be a linux binary in particular? The way I've been getting stats is a pain, so I certainly wouldn't mind having a better method. |
|
pkg/sql/opt/xform/rules/join.opt, line 11 at r1 (raw file): Previously, DrewKimball (Drew Kimball) wrote…
Ah, much less than I expected |
|
pkg/sql/opt/xform/rules/join.opt, line 11 at r1 (raw file): Previously, RaduBerinde wrote…
Well, with a high join limit, most time is spent optimizing rather than enumerating join orders, so this makes sense. Also, the nature of join reordering is such that the enumerating orders for the largest join tree (in this case with 8 joins) takes more time than the rest combined. Ideally, I'd like to pull CommuteJoin into the joinOrderBuilder logic. The reason I haven't yet is because we want to commute above the join limit (since commuting is relatively cheap) but DPSub still has to enumerate all conceivable join orders even when only commuting is allowed. |
|
That makes sense. I did consider at some point making the coster and execbuilder consider both variants of a hash join, in which case we wouldn't care about commuting (outside of join reordering). Some of the rules generating merge and lookup joins would need to be modified, they currently assume that they will run against both versions of the original join. |
rytaft
left a comment
There was a problem hiding this comment.
This is really exciting to see! My main comment is that the final plan often ends up with some unnecessary filters due to the equality closure. It would be good if you can find a way to remove the unnecessary filters.
Also added some comments inline.
Reviewed 38 of 39 files at r1, 1 of 1 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @DrewKimball, and @RaduBerinde)
pkg/sql/opt/norm/testdata/rules/select, line 396 at r2 (raw file):
# Regression test for #42035. This test uses the opt directive because the rule # is triggered during exploration. opt expect=ConsolidateSelectFilters disable=(InlineConstVar,RightAssociateJoinsLeft,RightAssociateJoinsRight)
If this rule is no longer triggered then I don't think this is testing what it was meant to anymore.
pkg/sql/opt/optbuilder/testdata/update_from, line 226 at r2 (raw file):
│ │ └── abc.a:4 = ac.a:10 │ └── filters │ ├── abc.a:4 = ab.a:7
this filter is unnecessary.
pkg/sql/opt/props/func_dep_test.go, line 196 at r2 (raw file):
{col1: 2, col2: 1, expected: true}, {col1: 1, col2: 3, expected: true}, {col1: 2, col2: 3, expected: true},
this is a duplicate -- maybe you should test 3 2 or 3 1
pkg/sql/opt/xform/join_order_builder.go, line 29 at r2 (raw file):
// // First, the initial join tree is traversed and encoded as a graph (technically // a hypergraph). The 'leaves' of the join tree become the vertexes of the
I think it would help to explain why it's a hypergraph -- that doesn't become clear until you look at the code.
pkg/sql/opt/xform/join_order_builder.go, line 89 at r2 (raw file):
// │ └── filters (true) // └── filters // └── a = x
this line is offset from the rest of the tree
pkg/sql/opt/xform/join_order_builder.go, line 92 at r2 (raw file):
// // Note that the implicit a = x edge that wasn't present in the original query // is used in the merge join between xy and uv. Also note that the xy and ab
there's no merge join in the above plan....
pkg/sql/opt/xform/join_order_builder.go, line 109 at r2 (raw file):
// │ └── filters (true) // └── filters // └── a = u
this line is offset from the rest of the tree
pkg/sql/opt/xform/join_order_builder.go, line 151 at r2 (raw file):
edges map[relationSet]memo.FiltersExpr // Plans maps from a set of base relations to the memo group for the join tree
[nit] Plans -> plans
pkg/sql/opt/xform/join_order_builder.go, line 232 at r2 (raw file):
// on the join's left and right inputs, so all base relations under the join // were added to jb.vertexes. relSet := relationSet((1<<(endIdx-startIdx) - 1) << startIdx)
Shouldn't this be (((1<<(endIdx-startIdx)) - 1) << startIdx)? i.e., aren't you missing some parentheses?
I think + and - have higher operator precedence than << and >>
pkg/sql/opt/xform/join_order_builder.go, line 298 at r2 (raw file):
// Enumerate all possible pairwise-disjoint binary partitions of the subset, // s1 AND s2. These represent sets of relations that may be joined together. for s1 := relationSet(1); s1 <= subset/2; s1++ {
Would help to give a hint about why you're dividing subset by 2 here
pkg/sql/opt/xform/join_order_builder.go, line 385 at r2 (raw file):
// Add all relations referenced by the predicate to the SES relationSet. Outer // columns will be ignored. This is correct because outer columns can be
This is confusing, since freeVars contains the outer columns of the predicates. I suppose here you're referring to outer columns that aren't contained in any of the relations?
DrewKimball
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @nvanbenschoten, @RaduBerinde, and @rytaft)
pkg/sql/opt/norm/testdata/rules/select, line 396 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
If this rule is no longer triggered then I don't think this is testing what it was meant to anymore.
To be honest, I have no idea how to get the rule to trigger at this point; I think it depended on AssociateJoins.
pkg/sql/opt/optbuilder/testdata/update_from, line 226 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
this filter is unnecessary.
I've flirted with the idea of avoiding redundant filters a bit, but I think I remember it resulting in a bit of a slowdown. I'm not sure if this was due to more time spent in the getEdges function or to removing filters that were somehow necessary for the best plan, so I'll try it again and properly profile this time.
pkg/sql/opt/props/func_dep_test.go, line 196 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
this is a duplicate -- maybe you should test 3 2 or 3 1
Right, meant that to be 3 2. Done.
pkg/sql/opt/xform/join_order_builder.go, line 132 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
I think it might make sense to unit test some of the methods of this struct in a
join_order_builder_test.gofile. For example, I could imagine a few tests that check that vertexes/edges/plans are built correctly for 2-3 join cases. It worries me that we have only higher-level tests that spew out so much output that it can be hard to detect subtle bugs in the logic.
I've added an opttester command 'reorderjoins' that shows what joinOrderBuilder is doing as it adds orderings to the memo. I introduced a few tricky bugs and it seemed pretty helpful, and even helped me catch a real bug (the rule was matching on the joins that had been added by previous firings).
I'd appreciate any comments on the format of the output, since I don't want to be the only one who can read it.
pkg/sql/opt/xform/join_order_builder.go, line 29 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I think it would help to explain why it's a hypergraph -- that doesn't become clear until you look at the code.
I'll add because predicates can reference more than one relation. Is that enough, or did you want more?
pkg/sql/opt/xform/join_order_builder.go, line 89 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
this line is offset from the rest of the tree
I suspect that's due to the way reviewable shows tabs; it looks fine in my IDE. I'll replace the spaces with tabs, and maybe that'll fix it.
pkg/sql/opt/xform/join_order_builder.go, line 92 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
there's no merge join in the above plan....
Huh, I actually meant to replace that example because it isn't very interesting. I'll change the query as well as the comments.
pkg/sql/opt/xform/join_order_builder.go, line 109 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
this line is offset from the rest of the tree
Right, hopefully fixing up tabs will take care of this.
pkg/sql/opt/xform/join_order_builder.go, line 151 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] Plans -> plans
Done.
pkg/sql/opt/xform/join_order_builder.go, line 232 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Shouldn't this be
(((1<<(endIdx-startIdx)) - 1) << startIdx)? i.e., aren't you missing some parentheses?I think
+and-have higher operator precedence than<<and>>
Yes, you're right. I was still getting the correct output for some reason, and when I evaluate 1<<3-1 in go playground I get 7 instead of 4, but parentheses are definitely a good idea anyway. Done.
pkg/sql/opt/xform/join_order_builder.go, line 298 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Would help to give a hint about why you're dividing subset by 2 here
Done.
pkg/sql/opt/xform/join_order_builder.go, line 385 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
This is confusing, since
freeVarscontains the outer columns of the predicates. I suppose here you're referring to outer columns that aren't contained in any of the relations?
Yes, that's correct. I'll try and clarify this. Done.
pkg/sql/opt/xform/testdata/external/tpce-no-stats, line 66 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
You should look at each of @nvanbenschoten's hand-tuned queries and see how they compare with what the optimizer selects. Here is a link to one of the files: https://github.com/cockroachlabs/tpc-e/blob/f3f0f57c61ba7e324c0273acf573238e739bcfe2/tier-a/src/txn_harness/broker_volume.rs. @nvanbenschoten, what's best way to experiment with each of these queries in isolation, to see how well the optimizer's plans hold up vs. the hand-tuned versions?
Right, I'll work on this when I get around to adding TPC-E statistics to the plans.
DrewKimball
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @nvanbenschoten, @RaduBerinde, and @rytaft)
pkg/sql/opt/norm/testdata/rules/select, line 396 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Then I'd either remove the test or just change the comment.
I'll remove it. Done.
pkg/sql/opt/optbuilder/testdata/update_from, line 226 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Sounds good -- it does add some execution overhead and may confuse the statistics code, so it would be good to investigate.
Well, I've added some code to handle this case; it actually seems to have improved planning time, I'd guess because we now generate less merge joins. That said, I do think I'll want to look into optimizing FuncDeps for handling equivalencies. Done.
pkg/sql/opt/testutils/opttester/opt_tester.go, line 166 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] I'd change this to something like
IsJoinLimitSetto make it clear that this field is simply marking whether theJoinLimitflag was set as part of the test.
Done.
pkg/sql/opt/xform/join_order_builder.go, line 29 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Looks good - thanks!
One thing I'm still not completely clear on, though, is whether you'd actually take a predicate like
a = b+cinto account for join ordering. Adding a bit more explanation somewhere (not necessarily here) would help.
I added a bit more a couple sentences down. Done.
pkg/sql/opt/xform/join_order_builder.go, line 47 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[super nit] I think this example would be cognitively easier to follow if you order the tables lexicographically by row count. So maybe have
abas the smallest table, followed bycd,ef, andgh.
Done.
pkg/sql/opt/xform/join_order_builder.go, line 598 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] missing period.
Done.
613d85c to
5985371
Compare
rytaft
left a comment
There was a problem hiding this comment.
Reviewed 17 of 17 files at r4.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @DrewKimball, @nvanbenschoten, and @RaduBerinde)
pkg/sql/opt/optbuilder/testdata/update_from, line 226 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Well, I've added some code to handle this case; it actually seems to have improved planning time, I'd guess because we now generate less merge joins. That said, I do think I'll want to look into optimizing FuncDeps for handling equivalencies. Done.
Looks good! Does this mean we're not choosing merge joins in some cases when we should be? Do we need to consider all equivalencies and not just explicit filters to determine whether we can generate a merge join?
I don't think you need to fix this as part of this PR, but maybe open an issue to investigate later if you think it might be a problem...
pkg/sql/opt/xform/testdata/rules/join, line 674 at r4 (raw file):
│ └── [presentation: a:1,b:2,c:3,x:5,y:6,z:7] │ ├── best: (inner-join G2 G3 G4) <<<<<<< HEAD
This needs some cleanup
DrewKimball
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @nvanbenschoten, @RaduBerinde, and @rytaft)
pkg/sql/opt/optbuilder/testdata/update_from, line 226 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Looks good! Does this mean we're not choosing merge joins in some cases when we should be? Do we need to consider all equivalencies and not just explicit filters to determine whether we can generate a merge join?
I don't think you need to fix this as part of this PR, but maybe open an issue to investigate later if you think it might be a problem...
Hm, actually I was misunderstanding how merge joins are generated. It doesn't make any difference which equivalencies are used.
New theory: it's because the original join order is (re)enumerated and added to the memo; when I was adding the redundant filters, the redundant plans were considered novel because of the extra filters and then added to the memo and explored. Now, we are guaranteed to use the original filters when possible because I first sort by Scalar ID (which I think is the same as ordering by time of construction) and then remove any redundant filters that follow.
So, the real problem was extra memo groups.
pkg/sql/opt/xform/testdata/rules/join, line 674 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
This needs some cleanup
Whoops, forgot to run after rebasing. Done.
b066787 to
09041a3
Compare
rytaft
left a comment
There was a problem hiding this comment.
Looks like you still need to update tpch_vec in the logic tests.
Reviewed 10 of 10 files at r5.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @nvanbenschoten, and @RaduBerinde)
pkg/sql/opt/optbuilder/testdata/update_from, line 226 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Hm, actually I was misunderstanding how merge joins are generated. It doesn't make any difference which equivalencies are used.
New theory: it's because the original join order is (re)enumerated and added to the memo; when I was adding the redundant filters, the redundant plans were considered novel because of the extra filters and then added to the memo and explored. Now, we are guaranteed to use the original filters when possible because I first sort by Scalar ID (which I think is the same as ordering by time of construction) and then remove any redundant filters that follow.
So, the real problem was extra memo groups.
Ok, great! Seems like this is strictly an improvement, then.
|
This is awesome work! I'm really excited about it! |
7aee3da to
f1ad1d8
Compare
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @DrewKimball, @nvanbenschoten, @RaduBerinde, and @rytaft)
pkg/sql/opt/norm/rules/citations.md, line 37 at r6 (raw file):
Cost-Based Query Transformation in Oracle.. 1026-1036. https://www.researchgate.net/publication/221311318_Cost-Based_Query_Transformation_in_Oracle z
[nit] maybe this should be in a xform/citations.md file
pkg/sql/opt/props/logical.go, line 281 at r6 (raw file):
// node. It is used to only reorder joins via ReorderJoins up to a certain // limit. JoinSize int
It still only counts inner joins, right?
We should also make the definition more exact. "Number of joins in the expression tree" sounds like the total number of joins inside the tree anywhere. It is really the number of joins in the maximal "complex" of only inner joins starting at the root. I guess a precise definition is: the number of nodes in the root's connected component after removing all non-inner-join nodes in the tree.
pkg/sql/opt/testutils/opttester/opt_tester.go, line 160 at r6 (raw file):
PerturbCost float64 // JoinLimit is the maximum number of joins in a query which the optimizer
[nit] NYC but this definition is meh, maybe just say it is the default value for SessionData.ReorderJoinsLimit so we don't have to define it in multiple places.
pkg/sql/opt/testutils/opttester/opt_tester.go, line 164 at r6 (raw file):
JoinLimit int // IsJoinLimitSet indicates whether opttester should set JoinLimit to a
Is there now a difference between JoinLimit=0 and JoinLimit=1? Maybe if we change how commuting works 0 would mean that we won't even commute a single join?
pkg/sql/opt/testutils/opttester/opt_tester.go, line 390 at r6 (raw file):
} if ot.Flags.IsJoinLimitSet {
This is not your change but I'm confused about this. AFAIK we never get a value for ReorderJoinsLimit from outside the opttester. We initialize to the default in opttester.New(). I think we should just set JoinLimit to the default on init, and set ReorderJoinsLimit here unconditionally.
pkg/sql/opt/testutils/opttester/reorder_joins.go, line 51 at r6 (raw file):
var treeNum = 1 o.JoinOrderBuilder().NotifyOnReorder(
This is very cool infrastructure, well done!
pkg/sql/opt/xform/custom_funcs.go, line 2573 at r6 (raw file):
} // deriveJoinSize returns the number of joins in the join tree rooted at the
Ditto on the definition here being wrong or at least unclear
pkg/sql/opt/xform/custom_funcs.go, line 2593 at r6 (raw file):
// ShouldReorderJoins returns whether the optimizer should attempt to find // a better ordering of inner joins. This is the case if (1) the given
[nit] put (1), (2) on separate lines. Can you expand on (1)? If there is some rule that creates joins, shouldn't those get reordered? Thinking of possible future rules that push group-by through join or vice-versa for example. Maybe another possible "rule" would be - it has to be the first expression, unless the first expression is not a join (?)
pkg/sql/opt/xform/join_order_builder.go, line 30 at r6 (raw file):
// -------------------------- // // First, the initial join tree is traversed and encoded as a graph (technically
[nit] I'd say encoded as a hypergraph (which is a graph in which an edge can join any number of vertices)
pkg/sql/opt/xform/join_order_builder.go, line 49 at r6 (raw file):
// xy rowcount: 10,000,000,000 // // The vertexes of the graph would be initialized with base relations ab,
[nit] s/would be initialized with base relations/represent the base relations
pkg/sql/opt/xform/join_order_builder.go, line 73 at r6 (raw file):
// a = u // // Finally, the DPSube algorithm is executed. DPSube enumerates all disjoint
mention the citation reference after DPSube
pkg/sql/opt/xform/join_order_builder.go, line 74 at r6 (raw file):
// // Finally, the DPSube algorithm is executed. DPSube enumerates all disjoint // pairs of subsets of base relations such as ([xy], [uv]) or ([xy, uv], [cd]).
sets normally use curly brackets not square
pkg/sql/opt/xform/join_order_builder.go, line 77 at r6 (raw file):
// This is done efficiently using bit sets. For each pair of relation subsets, a // join is added to the memo as long as there exists at least one edge that // connects the two subsets. All connecting edges are added to the new join's ON
[nit] add a paren after "the two subsets" (which ensures that we don't generate cross-joins)
pkg/sql/opt/xform/join_order_builder.go, line 263 at r6 (raw file):
// on the join's left and right inputs, so all base relations under the join // were added to jb.vertexes. relSet := relationSet(((1 << (endIdx - startIdx)) - 1) << startIdx)
This formula should be put in a helper method which has some comments.
We could also have a simpler allRelations(n int) function, and do relStart := allRelations(len(jb.vertexes)) and then relEnd := allRelations(len(jb.vertexes)) and relSet := relEnd ^ relStart.
pkg/sql/opt/xform/join_order_builder.go, line 299 at r6 (raw file):
for col2, ok2 := equivGroup.Next(col1 + 1); ok2; col2, ok2 = equivGroup.Next(col2 + 1) { relation1, relation2 := jb.getRelationIndexes(col1, col2) relations := relationSet(1 << relation1).union(relationSet(1 << relation2))
Would be useful to add a singleRelation() function to avoid the <<s everywhere
pkg/sql/opt/xform/join_order_builder.go, line 302 at r6 (raw file):
var1 := jb.f.Memo().MemoizeVariable(col1) var2 := jb.f.Memo().MemoizeVariable(col2) if !jb.hasColEq(jb.edges[relations], col1, col2) {
Are we sure we will actually need all these filter expressions? It may be useful to do it lazily and only construct when needed. Hmm, I guess if the two columns are from different tables, we will build the join between those two tables. Maybe it would help in cases with equal columns within the same table, but those are probably rare.
pkg/sql/opt/xform/join_order_builder.go, line 319 at r6 (raw file):
// TODO(drewk): implement DPHyp (or a similar algorithm). func (jb *JoinOrderBuilder) dpSube() { subsets := relationSet((1 << len(jb.vertexes)) - 1)
allRelations() would be useful here too
pkg/sql/opt/xform/join_order_builder.go, line 329 at r6 (raw file):
// s1 AND s2. These represent sets of relations that may be joined together. // Only iterate s1 up to subset/2 to avoid enumerating duplicate partitions. for s1 := relationSet(1); s1 <= subset/2; s1++ {
I had to think a bit about why subset/2 makes sense.. It's because the union between s1 and s2 is always s1+s2 because the bits are disjoint, would be worth mentioning.
pkg/sql/opt/xform/join_order_builder.go, line 744 at r6 (raw file):
// len returns the number of elements in the set. func (s relationSet) len() int64 {
[nit] it's odd that this returns int64
DrewKimball
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @andy-kimball, @nvanbenschoten, @RaduBerinde, and @rytaft)
pkg/sql/opt/props/logical.go, line 281 at r6 (raw file):
Previously, RaduBerinde wrote…
It still only counts inner joins, right?
We should also make the definition more exact. "Number of joins in the expression tree" sounds like the total number of joins inside the tree anywhere. It is really the number of joins in the maximal "complex" of only inner joins starting at the root. I guess a precise definition is: the number of nodes in the root's connected component after removing all non-inner-join nodes in the tree.
Done.
pkg/sql/opt/testutils/opttester/opt_tester.go, line 160 at r6 (raw file):
Previously, RaduBerinde wrote…
[nit] NYC but this definition is meh, maybe just say it is the default value for
SessionData.ReorderJoinsLimitso we don't have to define it in multiple places.
Done.
pkg/sql/opt/testutils/opttester/opt_tester.go, line 164 at r6 (raw file):
Previously, RaduBerinde wrote…
Is there now a difference between JoinLimit=0 and JoinLimit=1? Maybe if we change how commuting works 0 would mean that we won't even commute a single join?
No, there isn't any difference. Are you saying I should add a join size check to CommuteJoin?
pkg/sql/opt/testutils/opttester/opt_tester.go, line 390 at r6 (raw file):
Previously, RaduBerinde wrote…
This is not your change but I'm confused about this. AFAIK we never get a value for ReorderJoinsLimit from outside the opttester. We initialize to the default in
opttester.New(). I think we should just setJoinLimitto the default on init, and setReorderJoinsLimithere unconditionally.
Done.
pkg/sql/opt/testutils/opttester/reorder_joins.go, line 51 at r6 (raw file):
Previously, RaduBerinde wrote…
This is very cool infrastructure, well done!
Thanks!
pkg/sql/opt/xform/custom_funcs.go, line 2573 at r6 (raw file):
Previously, RaduBerinde wrote…
Ditto on the definition here being wrong or at least unclear
Done.
pkg/sql/opt/xform/custom_funcs.go, line 2593 at r6 (raw file):
Previously, RaduBerinde wrote…
[nit] put (1), (2) on separate lines. Can you expand on (1)? If there is some rule that creates joins, shouldn't those get reordered? Thinking of possible future rules that push group-by through join or vice-versa for example. Maybe another possible "rule" would be - it has to be the first expression, unless the first expression is not a join (?)
Correct me if I'm wrong, but I think rules like those would create new groups, right? Anyway, the issue with doing something like you're suggesting is that we'd then also match the commuted and reordered versions of that join within the same group.
I could probably get around this by setting the WasReordered field to true when an expression is added to the group as well as when a new group is created. I would also have to set WasReordered for CommuteJoin. However, the problem is that we would then end up adding duplicate expressions to the memo since the WasReordered bit would result in a different hash and commutation produces duplicates when applied twice.
I think probably the best thing to do is to keep things as they are for now, and then either pull commutation into JoinOrderBuilder once I've implemented DPHyp or remove commutation altogether. Once we've done either of those things, I think something like you've suggested would be more workable.
pkg/sql/opt/xform/join_order_builder.go, line 30 at r6 (raw file):
Previously, RaduBerinde wrote…
[nit] I'd say
encoded as a hypergraph (which is a graph in which an edge can join any number of vertices)
Done.
pkg/sql/opt/xform/join_order_builder.go, line 49 at r6 (raw file):
Previously, RaduBerinde wrote…
[nit] s/would be initialized with base relations/represent the base relations
Done.
pkg/sql/opt/xform/join_order_builder.go, line 73 at r6 (raw file):
Previously, RaduBerinde wrote…
mention the citation reference after DPSube
Done.
pkg/sql/opt/xform/join_order_builder.go, line 74 at r6 (raw file):
Previously, RaduBerinde wrote…
sets normally use curly brackets not square
Done.
pkg/sql/opt/xform/join_order_builder.go, line 77 at r6 (raw file):
Previously, RaduBerinde wrote…
[nit] add a paren after "the two subsets"
(which ensures that we don't generate cross-joins)
Done.
pkg/sql/opt/xform/join_order_builder.go, line 263 at r6 (raw file):
Previously, RaduBerinde wrote…
This formula should be put in a helper method which has some comments.
We could also have a simpler
allRelations(n int)function, and dorelStart := allRelations(len(jb.vertexes))and thenrelEnd := allRelations(len(jb.vertexes))andrelSet := relEnd ^ relStart.
Done. This is much nicer.
pkg/sql/opt/xform/join_order_builder.go, line 299 at r6 (raw file):
Previously, RaduBerinde wrote…
Would be useful to add a
singleRelation()function to avoid the<<s everywhere
You've just made me realize that I have an add() function to do this sort of thing. Done.
pkg/sql/opt/xform/join_order_builder.go, line 302 at r6 (raw file):
Previously, RaduBerinde wrote…
Are we sure we will actually need all these filter expressions? It may be useful to do it lazily and only construct when needed. Hmm, I guess if the two columns are from different tables, we will build the join between those two tables. Maybe it would help in cases with equal columns within the same table, but those are probably rare.
Yes, I think any filters that are added during this step will end up being used in some join. Might still be worth playing around with, though; I might be wrong.
pkg/sql/opt/xform/join_order_builder.go, line 319 at r6 (raw file):
Previously, RaduBerinde wrote…
allRelations()would be useful here too
Done.
pkg/sql/opt/xform/join_order_builder.go, line 329 at r6 (raw file):
Previously, RaduBerinde wrote…
I had to think a bit about why
subset/2makes sense.. It's because the union betweens1ands2is alwayss1+s2because the bits are disjoint, would be worth mentioning.
Done. Let me know if my explanation is good enough.
pkg/sql/opt/xform/join_order_builder.go, line 744 at r6 (raw file):
Previously, RaduBerinde wrote…
[nit] it's odd that this returns int64
Done.
pkg/sql/opt/norm/rules/citations.md, line 37 at r6 (raw file):
Previously, RaduBerinde wrote…
[nit] maybe this should be in a
xform/citations.mdfile
Done.
0f629e6 to
373ce33
Compare
|
bors r+ |
51180: opt: add rule to efficiently enumerate join orderings r=DrewKimball a=DrewKimball Previously, the optimizer used associative and commutative rules to reorder join trees. This method is limited in the number of joins that can be reordered without prohibitively long planning time. The long planning time is largely a result of two factors: 1. Applying these rules to every memo expression produces many duplicate plans. While the duplicates aren't actually added to the memo, they are still produced often enough to increase planning time. 2. Associative and commutative rules produce join orders in which cross joins are created which did not previously exist. These plans are rarely the best, since cross join output cardinalities are large except when the input cardinalities are very small. As an example of the second problem, take the following queries: ``` SELECT * FROM xy INNER JOIN ab ON x = a INNER JOIN uv ON a = u; SELECT * FROM xy INNER JOIN uv ON True INNER JOIN ab ON x = a AND a = u; ``` These queries are equivalent, but the second one is most likely slower than the first. This patch adds a library that constructs a graph representing the join tree to be reordered. The graph is then used to enumerate all valid join orderings all at once without duplicates, excluding those that involve creating cross joins. Only the first expression of each memo group is reordered. This addresses the above two problems, which significantly improves the time complexity of join reordering. Because of the planning time decrease, the default reorder_joins_limit can be increased. In addition, transitive closure is ensured for the join filters during contruction of the join graph. In the example query above, the x = a and a = u filters imply an x = u filter. This will become part of the join graph and will be used in constructing join trees, so an order like the following will now be possible without introducing a cross join: ``` SELECT * FROM xy INNER JOIN uv ON x = u INNER JOIN ab ON x = a AND a = u; ``` Finally, the method used to enumerate valid join orderings is extensible to non-inner joins. This will be left for a future PR. Closes #43039 Release note: None Co-authored-by: Drew Kimball <drewk@cockroachlabs.com>
Build failed |
Previously, the optimizer used associative and commutative rules to reorder join trees. This method is limited in the number of joins that can be reordered without prohibitively long planning time. The long planning time is largely a result of two factors: 1. Applying these rules to every memo expression produces many duplicate plans. While the duplicates aren't actually added to the memo, they are still produced often enough to increase planning time. 2. Associative and commutative rules produce join orders in which cross joins are created which did not previously exist. These plans are rarely the best, since cross join output cardinalities are large except when the input cardinalities are very small. As an example of the second problem, take the following queries: ``` SELECT * FROM xy INNER JOIN ab ON x = a INNER JOIN uv ON a = u; SELECT * FROM xy INNER JOIN uv ON True INNER JOIN ab ON x = a AND a = u; ``` These queries are equivalent, but the second one is most likely slower than the first. This patch adds a library that constructs a graph representing the join tree to be reordered. The graph is then used to enumerate all valid join orderings all at once without duplicates, excluding those that involve creating cross joins. Only the first expression of each memo group is reordered. This addresses the above two problems, which significantly improves the time complexity of join reordering. Because of the planning time decrease, the default reorder_joins_limit can be increased. In addition, transitive closure is ensured for the join filters during contruction of the join graph. In the example query above, the x = a and a = u filters imply an x = u filter. This will become part of the join graph and will be used in constructing join trees, so an order like the following will now be possible without introducing a cross join: ``` SELECT * FROM xy INNER JOIN uv ON x = u INNER JOIN ab ON x = a AND a = u; ``` Finally, the method used to enumerate valid join orderings is extensible to non-inner joins. This will be left for a future PR. Closes cockroachdb#43039 Release note: None
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @nvanbenschoten, @RaduBerinde, and @rytaft)
|
bors r+ |
Build succeeded |




Previously, the optimizer used associative and commutative rules
to reorder join trees. This method is limited in the number of
joins that can be reordered without prohibitively long planning
time. The long planning time is largely a result of two factors:
duplicate plans. While the duplicates aren't actually added to
the memo, they are still produced often enough to increase
planning time.
cross joins are created which did not previously exist. These
plans are rarely the best, since cross join output cardinalities
are large except when the input cardinalities are very small.
As an example of the second problem, take the following queries:
These queries are equivalent, but the second one is most likely
slower than the first.
This patch adds a library that constructs a graph representing the
join tree to be reordered. The graph is then used to enumerate all
valid join orderings all at once without duplicates, excluding those
that involve creating cross joins. Only the first expression of each
memo group is reordered. This addresses the above two problems,
which significantly improves the time complexity of join reordering.
Because of the planning time decrease, the default reorder_joins_limit
can be increased.
In addition, transitive closure is ensured for the join filters
during contruction of the join graph. In the example query above, the
x = a and a = u filters imply an x = u filter. This will become part
of the join graph and will be used in constructing join trees, so an
order like the following will now be possible without introducing a
cross join:
Finally, the method used to enumerate valid join orderings is extensible
to non-inner joins. This will be left for a future PR.
Closes #43039
Release note: None