opt: add rule to efficiently enumerate join orderings by DrewKimball · Pull Request #51180 · cockroachdb/cockroach

DrewKimball · 2020-07-08T23:21:13Z

Previously, the optimizer used associative and commutative rules
to reorder join trees. This method is limited in the number of
joins that can be reordered without prohibitively long planning
time. The long planning time is largely a result of two factors:

Applying these rules to every memo expression produces many
duplicate plans. While the duplicates aren't actually added to
the memo, they are still produced often enough to increase
planning time.
Associative and commutative rules produce join orders in which
cross joins are created which did not previously exist. These
plans are rarely the best, since cross join output cardinalities
are large except when the input cardinalities are very small.

As an example of the second problem, take the following queries:

SELECT * FROM xy INNER JOIN ab ON x = a INNER JOIN uv ON a = u;
SELECT * FROM xy INNER JOIN uv ON True INNER JOIN ab ON x = a AND a = u;

These queries are equivalent, but the second one is most likely
slower than the first.

This patch adds a library that constructs a graph representing the
join tree to be reordered. The graph is then used to enumerate all
valid join orderings all at once without duplicates, excluding those
that involve creating cross joins. Only the first expression of each
memo group is reordered. This addresses the above two problems,
which significantly improves the time complexity of join reordering.
Because of the planning time decrease, the default reorder_joins_limit
can be increased.

In addition, transitive closure is ensured for the join filters
during contruction of the join graph. In the example query above, the
x = a and a = u filters imply an x = u filter. This will become part
of the join graph and will be used in constructing join trees, so an
order like the following will now be possible without introducing a
cross join:

SELECT * FROM xy INNER JOIN uv ON x = u INNER JOIN ab ON x = a AND a = u;

Finally, the method used to enumerate valid join orderings is extensible
to non-inner joins. This will be left for a future PR.

Closes #43039

Release note: None

cockroach-teamcity · 2020-07-08T23:21:20Z

This change is

DrewKimball · 2020-07-08T23:32:42Z

Stats for TPC-H queries:

Note that the Q7 plan is slower for higher join limits - this is because of costing error.

Also note that a join limit of 3 on master corresponds to the command SET CLUSTER SETTING sql.defaults.reorder_joins_limit=4 because the original limit actually referred to a number of base (non-join) relations, rather than the number of joins. I've changed reorder_joins_limit to instead refer to the number of joins in the join tree, since this seems more intuitive (also, the comment for the ReorderJoinsLimit constant stated that DefaultJoinOrderLimit denotes the default limit on the number of joins to reorder, so I must not be the only one who thinks this is more intuitive)

DrewKimball · 2020-07-08T23:39:37Z

Planning times for this worst-case query with different join limits for master vs rule:

SELECT *
FROM xy AS xy1,
     xy AS xy2,
     xy AS xy3,
     xy AS xy4,
     xy AS xy5,
     xy AS xy6,
     xy AS xy7,
     xy AS xy8,
     xy AS xy9,
     xy AS xy10,
     xy AS xy11
WHERE xy1.x = xy2.x
AND   xy2.x = xy3.x
AND   xy3.x = xy4.x
AND   xy4.x = xy5.x
AND   xy5.x = xy6.x
AND   xy6.x = xy7.x
AND   xy7.x = xy8.x
AND   xy8.x = xy9.x
AND   xy9.x = xy10.x
AND   xy10.x = xy11.x

This is a worst-case query because ReorderJoins only enumerates orderings that do not introduce new cross joins. Because all base relations in this query are connected, all join orderings are considered worth adding to the memo, and so planning time grows very quickly with join tree size.

As with my other comment, keep in mind that (for example) the '1 Join' column in the table corresponds to the command SET CLUSTER SETTING sql.defaults.reorder_joins_limit=2 for master. I've since rectified this for the rule.

DrewKimball · 2020-07-08T23:40:48Z

I'm thinking a default join limit of 8 joins is a good place to start, but other opinions are welcome.

DrewKimball · 2020-07-08T23:46:51Z

Next Steps:

Add a nice debug output. This will make debugging much easier both for me and whoever maintains the code after me.
Add support for non-inner joins. This is one of the main reasons for making this switch in the first place.
Implement a more efficient join order enumeration algorithm (perhaps DPHyp). DPSub is what I currently have, and while it's nice and simple, it is very inefficient in cases where only a small fraction of the join-order search space is valid (because a large number of orderings have to be enumerated and then discarded).

DrewKimball · 2020-07-09T02:52:01Z

@rytaft I'm seeing some unexpected differences in stats and costs even with the same plans, and even when those plans don't involve inner joins:

Do you know what might cause this?

RaduBerinde · 2020-07-09T03:47:56Z

This is cool! Looking forward to reviewing in detail. From an initial look, the change is surprisingly non-invasive!

I think @justinj might be interested in looking at this too!

Starting with a join limit of 8 for now is reasonable. I think we will probably want to make it much higher, and add a limit on the number of orderings that we examine (effectively putting a limit on the planning time). It would be a pity to not take advantage of the algorithm for a large number of joins where we can still efficiently enumerate all (non-cross-join inducing) orderings.

The costs you asked about changed because you added some testcases for which you injected stats for those tables. Here's the diff for the plans:

DrewKimball

Ah, thanks Radu. Should have thought of that.

Reviewable status: complete! 0 of 0 LGTMs obtained

justinj · 2020-07-09T04:05:38Z

Very cool! I also didn't look in huge detail but very exciting!

andy-kimball · 2020-07-09T04:22:11Z

Does this PR improve the planning time for #43039?

andy-kimball

, some of my feedback is already incorporated, I left comments for some other things I noticed. Biggest thing is that I think you should unit test the joinOrderBuilder.go given it's a good-sized file with some tricky cases. Overall, it's great work and will be a big step forward for the optimizer on its path towards being best-in-industry. : )

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)

pkg/sql/opt/xform/join_order_builder.go, line 132 at r1 (raw file):

//
// Citations: [8]
type joinOrderBuilder struct {

I think it might make sense to unit test some of the methods of this struct in a join_order_builder_test.go file. For example, I could imagine a few tests that check that vertexes/edges/plans are built correctly for 2-3 join cases. It worries me that we have only higher-level tests that spew out so much output that it can be hard to detect subtle bugs in the logic.

pkg/sql/opt/xform/testdata/external/tpce-no-stats, line 66 at r1 (raw file):

      │    │    │    ├── multiplicity: left-rows(zero-or-more), right-rows(zero-or-one)
      │    │    │    ├── fd: ()-->(1,2,5), (6)-->(9), (15)-->(20), (37)-->(39), (39)-->(37), (36)==(37), (37)==(36), (15)==(33), (33)==(15), (6)==(20), (20)==(6), (3)==(9), (9)==(3), (1)==(5), (5)==(1)
      │    │    │    ├── scan broker@secondary

You should look at each of @nvanbenschoten's hand-tuned queries and see how they compare with what the optimizer selects. Here is a link to one of the files: https://github.com/cockroachlabs/tpc-e/blob/f3f0f57c61ba7e324c0273acf573238e739bcfe2/tier-a/src/txn_harness/broker_volume.rs. @nvanbenschoten, what's best way to experiment with each of these queries in isolation, to see how well the optimizer's plans hold up vs. the hand-tuned versions?

pkg/sql/opt/xform/testdata/rules/join, line 467 at r1 (raw file):


# Case where an implicit filter is used in the final plan (abc.a = pqr.p in the
# merge join) and equivalence closure is ensured for constant equalities (note

Update comment, since you're not putting constant equality support into this PR.

pkg/sql/opt/xform/testdata/rules/join, line 3726 at r1 (raw file):

SELECT * FROM (SELECT * FROM pqr ORDER BY q LIMIT 5) INNER JOIN abc ON a=r
----
inner-join (hash)

How come this one switched from lookup to hash? I'd think lookup is better in this case.

yuzefovich · 2020-07-09T05:26:34Z

I see that several plans for TPCH queries have changed, and it might be interesting to see how the execution runtimes of those queries have changed. I've actually just merged a tpchvec/bench roachtest variant that could be used to run the "benchmark" semi-automatically. The rough outline of its usage:

build cockroach binary for linux with this change (as well as bin/workload tool)
modify how we register tpchvec/bench by specifying different reorder_join_limit setting (old and new) and the queries to run (also commenting out Skip on that config)
run it with something like ./bin/roachtest run tpchvec/bench -u yahor --cockroach=cockroach-linux --workload=workload-linux.

yuzefovich · 2020-07-09T05:38:29Z

Oh, I didn't notice the second table in the image above that contains the runtime comparison already. Nice speedup on Q2!

RaduBerinde

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)

pkg/sql/opt/xform/rules/join.opt, line 11 at r1 (raw file):

# not reordered. For more information, see the comment in join_order_builder.go.
#
# TODO(drewk): Currently, ReorderJoins will match each join expression in a

This would be a significant optimization, right? We'll have to brainstorm a way to do it in the future. I don't see how without doing a special traversal of the expression before exploration starts.

It may be worthwhile to see how much difference it would make (you could hackShouldReorderJoins and hardcode a specific JoinSize for a test query, maybe the worst case query you mentioned)

nvb · 2020-07-09T17:09:15Z

You should look at each of @nvanbenschoten's hand-tuned queries and see how they compare with what the optimizer selects. Here is a link to one of the files: https://github.com/cockroachlabs/tpc-e/blob/f3f0f57c61ba7e324c0273acf573238e739bcfe2/tier-a/src/txn_harness/broker_volume.rs. @nvanbenschoten, what's best way to experiment with each of these queries in isolation, to see how well the optimizer's plans hold up vs. the hand-tuned versions?

I believe @DrewKimball has TPC-E 5k up and running on a roachprod cluster now, so the best way to experiment with the queries in isolation is probably just to run them with and without hand-tuning on that cluster while the load is running. You'll need to bump up the sql.defaults.reorder_joins_limit setting for some of the queries.

The three important queries to focus on in TPC-E are:

DrewKimball · 2020-07-09T18:29:50Z

pkg/sql/opt/xform/rules/join.opt, line 11 at r1 (raw file):

Previously, RaduBerinde wrote…

This would be a significant optimization, right? We'll have to brainstorm a way to do it in the future. I don't see how without doing a special traversal of the expression before exploration starts.

It may be worthwhile to see how much difference it would make (you could hackShouldReorderJoins and hardcode a specific JoinSize for a test query, maybe the worst case query you mentioned)

It looks like it shaves a couple 10's of ms off of planning time for that query with join limit = 8. That said, you're right that it's difficult to see a non-messy way to do this. I think the best solution is probably to simply implement a more efficient algorithm than DPSub, and see where that gets us.

DrewKimball · 2020-07-09T18:32:57Z

pkg/sql/opt/xform/testdata/rules/join, line 3726 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

How come this one switched from lookup to hash? I'd think lookup is better in this case.

That's because the stats changed when I moved the old AssociateJoins tests to the top of the file. Table abc is altered to have only 100 rows in the GenerateInvertedIndexZigzagJoins tests.

DrewKimball

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @DrewKimball, and @RaduBerinde)

pkg/sql/opt/xform/testdata/rules/join, line 467 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Update comment, since you're not putting constant equality support into this PR.

Done.

DrewKimball · 2020-07-09T18:39:07Z

build cockroach binary for linux with this change (as well as bin/workload tool)

Is there a reason why it has to be a linux binary in particular? The way I've been getting stats is a pain, so I certainly wouldn't mind having a better method.

RaduBerinde · 2020-07-09T18:59:01Z

pkg/sql/opt/xform/rules/join.opt, line 11 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

It looks like it shaves a couple 10's of ms off of planning time for that query with join limit = 8. That said, you're right that it's difficult to see a non-messy way to do this. I think the best solution is probably to simply implement a more efficient algorithm than DPSub, and see where that gets us.

Ah, much less than I expected

DrewKimball · 2020-07-09T19:11:32Z

pkg/sql/opt/xform/rules/join.opt, line 11 at r1 (raw file):

Previously, RaduBerinde wrote…

Ah, much less than I expected

Well, with a high join limit, most time is spent optimizing rather than enumerating join orders, so this makes sense. Also, the nature of join reordering is such that the enumerating orders for the largest join tree (in this case with 8 joins) takes more time than the rest combined.

Ideally, I'd like to pull CommuteJoin into the joinOrderBuilder logic. The reason I haven't yet is because we want to commute above the join limit (since commuting is relatively cheap) but DPSub still has to enumerate all conceivable join orders even when only commuting is allowed.

RaduBerinde · 2020-07-09T19:17:01Z

That makes sense.

I did consider at some point making the coster and execbuilder consider both variants of a hash join, in which case we wouldn't care about commuting (outside of join reordering). Some of the rules generating merge and lookup joins would need to be modified, they currently assume that they will run against both versions of the original join.

rytaft

This is really exciting to see! My main comment is that the final plan often ends up with some unnecessary filters due to the equality closure. It would be good if you can find a way to remove the unnecessary filters.

Also added some comments inline.

Reviewed 38 of 39 files at r1, 1 of 1 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @DrewKimball, and @RaduBerinde)

pkg/sql/opt/norm/testdata/rules/select, line 396 at r2 (raw file):

# Regression test for #42035. This test uses the opt directive because the rule
# is triggered during exploration.
opt expect=ConsolidateSelectFilters disable=(InlineConstVar,RightAssociateJoinsLeft,RightAssociateJoinsRight)

If this rule is no longer triggered then I don't think this is testing what it was meant to anymore.

pkg/sql/opt/optbuilder/testdata/update_from, line 226 at r2 (raw file):

      │    │         └── abc.a:4 = ac.a:10
      │    └── filters
      │         ├── abc.a:4 = ab.a:7

this filter is unnecessary.

pkg/sql/opt/props/func_dep_test.go, line 196 at r2 (raw file):

		{col1: 2, col2: 1, expected: true},
		{col1: 1, col2: 3, expected: true},
		{col1: 2, col2: 3, expected: true},

this is a duplicate -- maybe you should test 3 2 or 3 1

pkg/sql/opt/xform/join_order_builder.go, line 29 at r2 (raw file):

//
// First, the initial join tree is traversed and encoded as a graph (technically
// a hypergraph). The 'leaves' of the join tree become the vertexes of the

I think it would help to explain why it's a hypergraph -- that doesn't become clear until you look at the code.

pkg/sql/opt/xform/join_order_builder.go, line 89 at r2 (raw file):

//			│    └── filters (true)
//			└── filters
//           └── a = x

this line is offset from the rest of the tree

pkg/sql/opt/xform/join_order_builder.go, line 92 at r2 (raw file):

//
// Note that the implicit a = x edge that wasn't present in the original query
// is used in the merge join between xy and uv. Also note that the xy and ab

there's no merge join in the above plan....

pkg/sql/opt/xform/join_order_builder.go, line 109 at r2 (raw file):

//			│    └── filters (true)
//			└── filters
//           └── a = u

this line is offset from the rest of the tree

pkg/sql/opt/xform/join_order_builder.go, line 151 at r2 (raw file):

	edges map[relationSet]memo.FiltersExpr

	// Plans maps from a set of base relations to the memo group for the join tree

[nit] Plans -> plans

pkg/sql/opt/xform/join_order_builder.go, line 232 at r2 (raw file):

		// on the join's left and right inputs, so all base relations under the join
		// were added to jb.vertexes.
		relSet := relationSet((1<<(endIdx-startIdx) - 1) << startIdx)

Shouldn't this be (((1<<(endIdx-startIdx)) - 1) << startIdx)? i.e., aren't you missing some parentheses?

I think + and - have higher operator precedence than << and >>

pkg/sql/opt/xform/join_order_builder.go, line 298 at r2 (raw file):

		// Enumerate all possible pairwise-disjoint binary partitions of the subset,
		// s1 AND s2. These represent sets of relations that may be joined together.
		for s1 := relationSet(1); s1 <= subset/2; s1++ {

Would help to give a hint about why you're dividing subset by 2 here

pkg/sql/opt/xform/join_order_builder.go, line 385 at r2 (raw file):

	// Add all relations referenced by the predicate to the SES relationSet. Outer
	// columns will be ignored. This is correct because outer columns can be

This is confusing, since freeVars contains the outer columns of the predicates. I suppose here you're referring to outer columns that aren't contained in any of the relations?

DrewKimball

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @nvanbenschoten, @RaduBerinde, and @rytaft)

pkg/sql/opt/norm/testdata/rules/select, line 396 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

If this rule is no longer triggered then I don't think this is testing what it was meant to anymore.

To be honest, I have no idea how to get the rule to trigger at this point; I think it depended on AssociateJoins.

pkg/sql/opt/optbuilder/testdata/update_from, line 226 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

this filter is unnecessary.

I've flirted with the idea of avoiding redundant filters a bit, but I think I remember it resulting in a bit of a slowdown. I'm not sure if this was due to more time spent in the getEdges function or to removing filters that were somehow necessary for the best plan, so I'll try it again and properly profile this time.

pkg/sql/opt/props/func_dep_test.go, line 196 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

this is a duplicate -- maybe you should test 3 2 or 3 1

Right, meant that to be 3 2. Done.

pkg/sql/opt/xform/join_order_builder.go, line 132 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

I think it might make sense to unit test some of the methods of this struct in a join_order_builder_test.go file. For example, I could imagine a few tests that check that vertexes/edges/plans are built correctly for 2-3 join cases. It worries me that we have only higher-level tests that spew out so much output that it can be hard to detect subtle bugs in the logic.

I've added an opttester command 'reorderjoins' that shows what joinOrderBuilder is doing as it adds orderings to the memo. I introduced a few tricky bugs and it seemed pretty helpful, and even helped me catch a real bug (the rule was matching on the joins that had been added by previous firings).

I'd appreciate any comments on the format of the output, since I don't want to be the only one who can read it.

pkg/sql/opt/xform/join_order_builder.go, line 29 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think it would help to explain why it's a hypergraph -- that doesn't become clear until you look at the code.

I'll add because predicates can reference more than one relation. Is that enough, or did you want more?

pkg/sql/opt/xform/join_order_builder.go, line 89 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

this line is offset from the rest of the tree

I suspect that's due to the way reviewable shows tabs; it looks fine in my IDE. I'll replace the spaces with tabs, and maybe that'll fix it.

pkg/sql/opt/xform/join_order_builder.go, line 92 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

there's no merge join in the above plan....

Huh, I actually meant to replace that example because it isn't very interesting. I'll change the query as well as the comments.

pkg/sql/opt/xform/join_order_builder.go, line 109 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

this line is offset from the rest of the tree

Right, hopefully fixing up tabs will take care of this.

pkg/sql/opt/xform/join_order_builder.go, line 151 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] Plans -> plans

Done.

pkg/sql/opt/xform/join_order_builder.go, line 232 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Shouldn't this be (((1<<(endIdx-startIdx)) - 1) << startIdx)? i.e., aren't you missing some parentheses?

I think + and - have higher operator precedence than << and >>

Yes, you're right. I was still getting the correct output for some reason, and when I evaluate 1<<3-1 in go playground I get 7 instead of 4, but parentheses are definitely a good idea anyway. Done.

pkg/sql/opt/xform/join_order_builder.go, line 298 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Would help to give a hint about why you're dividing subset by 2 here

Done.

pkg/sql/opt/xform/join_order_builder.go, line 385 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

This is confusing, since freeVars contains the outer columns of the predicates. I suppose here you're referring to outer columns that aren't contained in any of the relations?

Yes, that's correct. I'll try and clarify this. Done.

pkg/sql/opt/xform/testdata/external/tpce-no-stats, line 66 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

You should look at each of @nvanbenschoten's hand-tuned queries and see how they compare with what the optimizer selects. Here is a link to one of the files: https://github.com/cockroachlabs/tpc-e/blob/f3f0f57c61ba7e324c0273acf573238e739bcfe2/tier-a/src/txn_harness/broker_volume.rs. @nvanbenschoten, what's best way to experiment with each of these queries in isolation, to see how well the optimizer's plans hold up vs. the hand-tuned versions?

Right, I'll work on this when I get around to adding TPC-E statistics to the plans.

DrewKimball

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @nvanbenschoten, @RaduBerinde, and @rytaft)

pkg/sql/opt/norm/testdata/rules/select, line 396 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Then I'd either remove the test or just change the comment.

I'll remove it. Done.

pkg/sql/opt/optbuilder/testdata/update_from, line 226 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Sounds good -- it does add some execution overhead and may confuse the statistics code, so it would be good to investigate.

Well, I've added some code to handle this case; it actually seems to have improved planning time, I'd guess because we now generate less merge joins. That said, I do think I'll want to look into optimizing FuncDeps for handling equivalencies. Done.

pkg/sql/opt/testutils/opttester/opt_tester.go, line 166 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] I'd change this to something like IsJoinLimitSet to make it clear that this field is simply marking whether the JoinLimit flag was set as part of the test.

Done.

pkg/sql/opt/xform/join_order_builder.go, line 29 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Looks good - thanks!

One thing I'm still not completely clear on, though, is whether you'd actually take a predicate like a = b+c into account for join ordering. Adding a bit more explanation somewhere (not necessarily here) would help.

I added a bit more a couple sentences down. Done.

pkg/sql/opt/xform/join_order_builder.go, line 47 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[super nit] I think this example would be cognitively easier to follow if you order the tables lexicographically by row count. So maybe have ab as the smallest table, followed by cd, ef, and gh.

Done.

pkg/sql/opt/xform/join_order_builder.go, line 598 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] missing period.

Done.

rytaft

really nice work!

Reviewed 17 of 17 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @DrewKimball, @nvanbenschoten, and @RaduBerinde)

pkg/sql/opt/optbuilder/testdata/update_from, line 226 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Well, I've added some code to handle this case; it actually seems to have improved planning time, I'd guess because we now generate less merge joins. That said, I do think I'll want to look into optimizing FuncDeps for handling equivalencies. Done.

Looks good! Does this mean we're not choosing merge joins in some cases when we should be? Do we need to consider all equivalencies and not just explicit filters to determine whether we can generate a merge join?

I don't think you need to fix this as part of this PR, but maybe open an issue to investigate later if you think it might be a problem...

pkg/sql/opt/xform/testdata/rules/join, line 674 at r4 (raw file):

 │    └── [presentation: a:1,b:2,c:3,x:5,y:6,z:7]
 │         ├── best: (inner-join G2 G3 G4)
<<<<<<< HEAD

This needs some cleanup

DrewKimball

Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @nvanbenschoten, @RaduBerinde, and @rytaft)

pkg/sql/opt/optbuilder/testdata/update_from, line 226 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Looks good! Does this mean we're not choosing merge joins in some cases when we should be? Do we need to consider all equivalencies and not just explicit filters to determine whether we can generate a merge join?

I don't think you need to fix this as part of this PR, but maybe open an issue to investigate later if you think it might be a problem...

Hm, actually I was misunderstanding how merge joins are generated. It doesn't make any difference which equivalencies are used.

New theory: it's because the original join order is (re)enumerated and added to the memo; when I was adding the redundant filters, the redundant plans were considered novel because of the extra filters and then added to the memo and explored. Now, we are guaranteed to use the original filters when possible because I first sort by Scalar ID (which I think is the same as ordering by time of construction) and then remove any redundant filters that follow.

So, the real problem was extra memo groups.

pkg/sql/opt/xform/testdata/rules/join, line 674 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

This needs some cleanup

Whoops, forgot to run after rebasing. Done.

rytaft

Looks like you still need to update tpch_vec in the logic tests.

Reviewed 10 of 10 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @nvanbenschoten, and @RaduBerinde)

pkg/sql/opt/optbuilder/testdata/update_from, line 226 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Hm, actually I was misunderstanding how merge joins are generated. It doesn't make any difference which equivalencies are used.

New theory: it's because the original join order is (re)enumerated and added to the memo; when I was adding the redundant filters, the redundant plans were considered novel because of the extra filters and then added to the memo and explored. Now, we are guaranteed to use the original filters when possible because I first sort by Scalar ID (which I think is the same as ordering by time of construction) and then remove any redundant filters that follow.

So, the real problem was extra memo groups.

Ok, great! Seems like this is strictly an improvement, then.

awoods187 · 2020-07-14T13:28:09Z

This is awesome work! I'm really excited about it!

RaduBerinde

Very cool!

Reviewable status: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @DrewKimball, @nvanbenschoten, @RaduBerinde, and @rytaft)

pkg/sql/opt/norm/rules/citations.md, line 37 at r6 (raw file):

    Cost-Based Query Transformation in Oracle.. 1026-1036.
    https://www.researchgate.net/publication/221311318_Cost-Based_Query_Transformation_in_Oracle z

[nit] maybe this should be in a xform/citations.md file

pkg/sql/opt/props/logical.go, line 281 at r6 (raw file):

		// node. It is used to only reorder joins via ReorderJoins up to a certain
		// limit.
		JoinSize int

It still only counts inner joins, right?

We should also make the definition more exact. "Number of joins in the expression tree" sounds like the total number of joins inside the tree anywhere. It is really the number of joins in the maximal "complex" of only inner joins starting at the root. I guess a precise definition is: the number of nodes in the root's connected component after removing all non-inner-join nodes in the tree.

pkg/sql/opt/testutils/opttester/opt_tester.go, line 160 at r6 (raw file):

	PerturbCost float64

	// JoinLimit is the maximum number of joins in a query which the optimizer

[nit] NYC but this definition is meh, maybe just say it is the default value for SessionData.ReorderJoinsLimit so we don't have to define it in multiple places.

pkg/sql/opt/testutils/opttester/opt_tester.go, line 164 at r6 (raw file):

	JoinLimit int

	// IsJoinLimitSet indicates whether opttester should set JoinLimit to a

Is there now a difference between JoinLimit=0 and JoinLimit=1? Maybe if we change how commuting works 0 would mean that we won't even commute a single join?

pkg/sql/opt/testutils/opttester/opt_tester.go, line 390 at r6 (raw file):

	}

	if ot.Flags.IsJoinLimitSet {

This is not your change but I'm confused about this. AFAIK we never get a value for ReorderJoinsLimit from outside the opttester. We initialize to the default in opttester.New(). I think we should just set JoinLimit to the default on init, and set ReorderJoinsLimit here unconditionally.

pkg/sql/opt/testutils/opttester/reorder_joins.go, line 51 at r6 (raw file):

	var treeNum = 1

	o.JoinOrderBuilder().NotifyOnReorder(

This is very cool infrastructure, well done!

pkg/sql/opt/xform/custom_funcs.go, line 2573 at r6 (raw file):

}

// deriveJoinSize returns the number of joins in the join tree rooted at the

Ditto on the definition here being wrong or at least unclear

pkg/sql/opt/xform/custom_funcs.go, line 2593 at r6 (raw file):

// ShouldReorderJoins returns whether the optimizer should attempt to find
// a better ordering of inner joins. This is the case if (1) the given

[nit] put (1), (2) on separate lines. Can you expand on (1)? If there is some rule that creates joins, shouldn't those get reordered? Thinking of possible future rules that push group-by through join or vice-versa for example. Maybe another possible "rule" would be - it has to be the first expression, unless the first expression is not a join (?)

pkg/sql/opt/xform/join_order_builder.go, line 30 at r6 (raw file):

// --------------------------
//
// First, the initial join tree is traversed and encoded as a graph (technically

[nit] I'd say encoded as a hypergraph (which is a graph in which an edge can join any number of vertices)

pkg/sql/opt/xform/join_order_builder.go, line 49 at r6 (raw file):

// xy rowcount: 10,000,000,000
//
// The vertexes of the graph would be initialized with base relations ab,

[nit] s/would be initialized with base relations/represent the base relations

pkg/sql/opt/xform/join_order_builder.go, line 73 at r6 (raw file):

//		      a = u
//
// Finally, the DPSube algorithm is executed. DPSube enumerates all disjoint

mention the citation reference after DPSube

pkg/sql/opt/xform/join_order_builder.go, line 74 at r6 (raw file):

//
// Finally, the DPSube algorithm is executed. DPSube enumerates all disjoint
// pairs of subsets of base relations such as ([xy], [uv]) or ([xy, uv], [cd]).

sets normally use curly brackets not square

pkg/sql/opt/xform/join_order_builder.go, line 77 at r6 (raw file):

// This is done efficiently using bit sets. For each pair of relation subsets, a
// join is added to the memo as long as there exists at least one edge that
// connects the two subsets. All connecting edges are added to the new join's ON

[nit] add a paren after "the two subsets" (which ensures that we don't generate cross-joins)

pkg/sql/opt/xform/join_order_builder.go, line 263 at r6 (raw file):

		// on the join's left and right inputs, so all base relations under the join
		// were added to jb.vertexes.
		relSet := relationSet(((1 << (endIdx - startIdx)) - 1) << startIdx)

This formula should be put in a helper method which has some comments.

We could also have a simpler allRelations(n int) function, and do relStart := allRelations(len(jb.vertexes)) and then relEnd := allRelations(len(jb.vertexes)) and relSet := relEnd ^ relStart.

pkg/sql/opt/xform/join_order_builder.go, line 299 at r6 (raw file):

			for col2, ok2 := equivGroup.Next(col1 + 1); ok2; col2, ok2 = equivGroup.Next(col2 + 1) {
				relation1, relation2 := jb.getRelationIndexes(col1, col2)
				relations := relationSet(1 << relation1).union(relationSet(1 << relation2))

Would be useful to add a singleRelation() function to avoid the <<s everywhere

pkg/sql/opt/xform/join_order_builder.go, line 302 at r6 (raw file):

				var1 := jb.f.Memo().MemoizeVariable(col1)
				var2 := jb.f.Memo().MemoizeVariable(col2)
				if !jb.hasColEq(jb.edges[relations], col1, col2) {

Are we sure we will actually need all these filter expressions? It may be useful to do it lazily and only construct when needed. Hmm, I guess if the two columns are from different tables, we will build the join between those two tables. Maybe it would help in cases with equal columns within the same table, but those are probably rare.

pkg/sql/opt/xform/join_order_builder.go, line 319 at r6 (raw file):

// TODO(drewk): implement DPHyp (or a similar algorithm).
func (jb *JoinOrderBuilder) dpSube() {
	subsets := relationSet((1 << len(jb.vertexes)) - 1)

allRelations() would be useful here too

pkg/sql/opt/xform/join_order_builder.go, line 329 at r6 (raw file):

		// s1 AND s2. These represent sets of relations that may be joined together.
		// Only iterate s1 up to subset/2 to avoid enumerating duplicate partitions.
		for s1 := relationSet(1); s1 <= subset/2; s1++ {

I had to think a bit about why subset/2 makes sense.. It's because the union between s1 and s2 is always s1+s2 because the bits are disjoint, would be worth mentioning.

pkg/sql/opt/xform/join_order_builder.go, line 744 at r6 (raw file):

// len returns the number of elements in the set.
func (s relationSet) len() int64 {

[nit] it's odd that this returns int64

DrewKimball

Reviewable status: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @andy-kimball, @nvanbenschoten, @RaduBerinde, and @rytaft)

pkg/sql/opt/props/logical.go, line 281 at r6 (raw file):

Previously, RaduBerinde wrote…

It still only counts inner joins, right?

We should also make the definition more exact. "Number of joins in the expression tree" sounds like the total number of joins inside the tree anywhere. It is really the number of joins in the maximal "complex" of only inner joins starting at the root. I guess a precise definition is: the number of nodes in the root's connected component after removing all non-inner-join nodes in the tree.

Done.

pkg/sql/opt/testutils/opttester/opt_tester.go, line 160 at r6 (raw file):

Previously, RaduBerinde wrote…

[nit] NYC but this definition is meh, maybe just say it is the default value for SessionData.ReorderJoinsLimit so we don't have to define it in multiple places.

Done.

pkg/sql/opt/testutils/opttester/opt_tester.go, line 164 at r6 (raw file):

Previously, RaduBerinde wrote…

Is there now a difference between JoinLimit=0 and JoinLimit=1? Maybe if we change how commuting works 0 would mean that we won't even commute a single join?

No, there isn't any difference. Are you saying I should add a join size check to CommuteJoin?

pkg/sql/opt/testutils/opttester/opt_tester.go, line 390 at r6 (raw file):

Previously, RaduBerinde wrote…

This is not your change but I'm confused about this. AFAIK we never get a value for ReorderJoinsLimit from outside the opttester. We initialize to the default in opttester.New(). I think we should just set JoinLimit to the default on init, and set ReorderJoinsLimit here unconditionally.

Done.

pkg/sql/opt/testutils/opttester/reorder_joins.go, line 51 at r6 (raw file):

Previously, RaduBerinde wrote…

This is very cool infrastructure, well done!

Thanks!

pkg/sql/opt/xform/custom_funcs.go, line 2573 at r6 (raw file):

Previously, RaduBerinde wrote…

Ditto on the definition here being wrong or at least unclear

Done.

pkg/sql/opt/xform/custom_funcs.go, line 2593 at r6 (raw file):

Previously, RaduBerinde wrote…

[nit] put (1), (2) on separate lines. Can you expand on (1)? If there is some rule that creates joins, shouldn't those get reordered? Thinking of possible future rules that push group-by through join or vice-versa for example. Maybe another possible "rule" would be - it has to be the first expression, unless the first expression is not a join (?)

Correct me if I'm wrong, but I think rules like those would create new groups, right? Anyway, the issue with doing something like you're suggesting is that we'd then also match the commuted and reordered versions of that join within the same group.

I could probably get around this by setting the WasReordered field to true when an expression is added to the group as well as when a new group is created. I would also have to set WasReordered for CommuteJoin. However, the problem is that we would then end up adding duplicate expressions to the memo since the WasReordered bit would result in a different hash and commutation produces duplicates when applied twice.

I think probably the best thing to do is to keep things as they are for now, and then either pull commutation into JoinOrderBuilder once I've implemented DPHyp or remove commutation altogether. Once we've done either of those things, I think something like you've suggested would be more workable.

pkg/sql/opt/xform/join_order_builder.go, line 30 at r6 (raw file):

Previously, RaduBerinde wrote…

[nit] I'd say encoded as a hypergraph (which is a graph in which an edge can join any number of vertices)

Done.

pkg/sql/opt/xform/join_order_builder.go, line 49 at r6 (raw file):

Previously, RaduBerinde wrote…

[nit] s/would be initialized with base relations/represent the base relations

Done.

pkg/sql/opt/xform/join_order_builder.go, line 73 at r6 (raw file):

Previously, RaduBerinde wrote…

mention the citation reference after DPSube

Done.

pkg/sql/opt/xform/join_order_builder.go, line 74 at r6 (raw file):

Previously, RaduBerinde wrote…

sets normally use curly brackets not square

Done.

pkg/sql/opt/xform/join_order_builder.go, line 77 at r6 (raw file):

Previously, RaduBerinde wrote…

[nit] add a paren after "the two subsets" (which ensures that we don't generate cross-joins)

Done.

pkg/sql/opt/xform/join_order_builder.go, line 263 at r6 (raw file):

Previously, RaduBerinde wrote…

This formula should be put in a helper method which has some comments.

We could also have a simpler allRelations(n int) function, and do relStart := allRelations(len(jb.vertexes)) and then relEnd := allRelations(len(jb.vertexes)) and relSet := relEnd ^ relStart.

Done. This is much nicer.

pkg/sql/opt/xform/join_order_builder.go, line 299 at r6 (raw file):

Previously, RaduBerinde wrote…

Would be useful to add a singleRelation() function to avoid the <<s everywhere

You've just made me realize that I have an add() function to do this sort of thing. Done.

pkg/sql/opt/xform/join_order_builder.go, line 302 at r6 (raw file):

Previously, RaduBerinde wrote…

Are we sure we will actually need all these filter expressions? It may be useful to do it lazily and only construct when needed. Hmm, I guess if the two columns are from different tables, we will build the join between those two tables. Maybe it would help in cases with equal columns within the same table, but those are probably rare.

Yes, I think any filters that are added during this step will end up being used in some join. Might still be worth playing around with, though; I might be wrong.

pkg/sql/opt/xform/join_order_builder.go, line 319 at r6 (raw file):

Previously, RaduBerinde wrote…

allRelations() would be useful here too

Done.

pkg/sql/opt/xform/join_order_builder.go, line 329 at r6 (raw file):

Previously, RaduBerinde wrote…

I had to think a bit about why subset/2 makes sense.. It's because the union between s1 and s2 is always s1+s2 because the bits are disjoint, would be worth mentioning.

Done. Let me know if my explanation is good enough.

pkg/sql/opt/xform/join_order_builder.go, line 744 at r6 (raw file):

Previously, RaduBerinde wrote…

[nit] it's odd that this returns int64

Done.

pkg/sql/opt/norm/rules/citations.md, line 37 at r6 (raw file):

Previously, RaduBerinde wrote…

[nit] maybe this should be in a xform/citations.md file

Done.

DrewKimball · 2020-07-18T22:08:06Z

bors r+

51180: opt: add rule to efficiently enumerate join orderings r=DrewKimball a=DrewKimball Previously, the optimizer used associative and commutative rules to reorder join trees. This method is limited in the number of joins that can be reordered without prohibitively long planning time. The long planning time is largely a result of two factors: 1. Applying these rules to every memo expression produces many duplicate plans. While the duplicates aren't actually added to the memo, they are still produced often enough to increase planning time. 2. Associative and commutative rules produce join orders in which cross joins are created which did not previously exist. These plans are rarely the best, since cross join output cardinalities are large except when the input cardinalities are very small. As an example of the second problem, take the following queries: ``` SELECT * FROM xy INNER JOIN ab ON x = a INNER JOIN uv ON a = u; SELECT * FROM xy INNER JOIN uv ON True INNER JOIN ab ON x = a AND a = u; ``` These queries are equivalent, but the second one is most likely slower than the first. This patch adds a library that constructs a graph representing the join tree to be reordered. The graph is then used to enumerate all valid join orderings all at once without duplicates, excluding those that involve creating cross joins. Only the first expression of each memo group is reordered. This addresses the above two problems, which significantly improves the time complexity of join reordering. Because of the planning time decrease, the default reorder_joins_limit can be increased. In addition, transitive closure is ensured for the join filters during contruction of the join graph. In the example query above, the x = a and a = u filters imply an x = u filter. This will become part of the join graph and will be used in constructing join trees, so an order like the following will now be possible without introducing a cross join: ``` SELECT * FROM xy INNER JOIN uv ON x = u INNER JOIN ab ON x = a AND a = u; ``` Finally, the method used to enumerate valid join orderings is extensible to non-inner joins. This will be left for a future PR. Closes #43039 Release note: None Co-authored-by: Drew Kimball <drewk@cockroachlabs.com>

craig · 2020-07-18T22:51:15Z

Build failed

GitHub CI (Cockroach)

Previously, the optimizer used associative and commutative rules to reorder join trees. This method is limited in the number of joins that can be reordered without prohibitively long planning time. The long planning time is largely a result of two factors: 1. Applying these rules to every memo expression produces many duplicate plans. While the duplicates aren't actually added to the memo, they are still produced often enough to increase planning time. 2. Associative and commutative rules produce join orders in which cross joins are created which did not previously exist. These plans are rarely the best, since cross join output cardinalities are large except when the input cardinalities are very small. As an example of the second problem, take the following queries: ``` SELECT * FROM xy INNER JOIN ab ON x = a INNER JOIN uv ON a = u; SELECT * FROM xy INNER JOIN uv ON True INNER JOIN ab ON x = a AND a = u; ``` These queries are equivalent, but the second one is most likely slower than the first. This patch adds a library that constructs a graph representing the join tree to be reordered. The graph is then used to enumerate all valid join orderings all at once without duplicates, excluding those that involve creating cross joins. Only the first expression of each memo group is reordered. This addresses the above two problems, which significantly improves the time complexity of join reordering. Because of the planning time decrease, the default reorder_joins_limit can be increased. In addition, transitive closure is ensured for the join filters during contruction of the join graph. In the example query above, the x = a and a = u filters imply an x = u filter. This will become part of the join graph and will be used in constructing join trees, so an order like the following will now be possible without introducing a cross join: ``` SELECT * FROM xy INNER JOIN uv ON x = u INNER JOIN ab ON x = a AND a = u; ``` Finally, the method used to enumerate valid join orderings is extensible to non-inner joins. This will be left for a future PR. Closes cockroachdb#43039 Release note: None

RaduBerinde

Reviewable status: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @nvanbenschoten, @RaduBerinde, and @rytaft)

DrewKimball · 2020-07-20T19:59:32Z

bors r+

craig · 2020-07-20T20:57:28Z

Build succeeded

GitHub CI (Cockroach)

DrewKimball requested a review from a team as a code owner July 8, 2020 23:21

DrewKimball force-pushed the join_order branch from b68cce2 to 53d2b48 Compare July 9, 2020 01:55

DrewKimball force-pushed the join_order branch from 53d2b48 to fa1bde1 Compare July 9, 2020 03:32

DrewKimball commented Jul 9, 2020

View reviewed changes

andy-kimball approved these changes Jul 9, 2020

View reviewed changes

RaduBerinde reviewed Jul 9, 2020

View reviewed changes

DrewKimball commented Jul 9, 2020

View reviewed changes

DrewKimball force-pushed the join_order branch from fa1bde1 to 65192ec Compare July 9, 2020 18:35

rytaft reviewed Jul 10, 2020

View reviewed changes

DrewKimball commented Jul 11, 2020

View reviewed changes

DrewKimball force-pushed the join_order branch from 65192ec to 38b894a Compare July 11, 2020 06:07

DrewKimball mentioned this pull request Jul 11, 2020

opt: common 5-way virtual table join takes 45ms to plan #43039

Closed

DrewKimball commented Jul 13, 2020

View reviewed changes

DrewKimball force-pushed the join_order branch 2 times, most recently from 613d85c to 5985371 Compare July 13, 2020 21:00

rytaft approved these changes Jul 13, 2020

View reviewed changes

DrewKimball force-pushed the join_order branch from 5985371 to bd9ed7c Compare July 14, 2020 02:45

DrewKimball commented Jul 14, 2020

View reviewed changes

DrewKimball force-pushed the join_order branch 2 times, most recently from b066787 to 09041a3 Compare July 14, 2020 03:35

rytaft approved these changes Jul 14, 2020

View reviewed changes

DrewKimball force-pushed the join_order branch 3 times, most recently from 7aee3da to f1ad1d8 Compare July 16, 2020 06:12

RaduBerinde reviewed Jul 16, 2020

View reviewed changes

DrewKimball force-pushed the join_order branch from f1ad1d8 to 3dde341 Compare July 17, 2020 01:34

DrewKimball commented Jul 17, 2020

View reviewed changes

DrewKimball force-pushed the join_order branch 3 times, most recently from 0f629e6 to 373ce33 Compare July 18, 2020 08:19

DrewKimball force-pushed the join_order branch from 373ce33 to 9912a17 Compare July 19, 2020 00:12

RaduBerinde approved these changes Jul 20, 2020

View reviewed changes

craig bot merged commit febcd4f into cockroachdb:master Jul 20, 2020

DrewKimball deleted the join_order branch July 20, 2020 21:39

yuzefovich mentioned this pull request Jul 28, 2020

roachtest: tpchvec/perf_no_stats failed #51883

Closed

Conversation

DrewKimball commented Jul 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Jul 8, 2020

Uh oh!

DrewKimball commented Jul 8, 2020

Uh oh!

DrewKimball commented Jul 8, 2020

Uh oh!

DrewKimball commented Jul 8, 2020

Uh oh!

DrewKimball commented Jul 8, 2020

Uh oh!

DrewKimball commented Jul 9, 2020

Uh oh!

RaduBerinde commented Jul 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrewKimball left a comment

Choose a reason for hiding this comment

Uh oh!

justinj commented Jul 9, 2020

Uh oh!

andy-kimball commented Jul 9, 2020

Uh oh!

andy-kimball left a comment

Choose a reason for hiding this comment

Uh oh!

yuzefovich commented Jul 9, 2020

Uh oh!

yuzefovich commented Jul 9, 2020

Uh oh!

RaduBerinde left a comment

Choose a reason for hiding this comment

Uh oh!

nvb commented Jul 9, 2020

Uh oh!

DrewKimball commented Jul 9, 2020

Uh oh!

DrewKimball commented Jul 9, 2020

Uh oh!

DrewKimball left a comment

Choose a reason for hiding this comment

Uh oh!

DrewKimball commented Jul 9, 2020

Uh oh!

RaduBerinde commented Jul 9, 2020

Uh oh!

DrewKimball commented Jul 9, 2020

Uh oh!

RaduBerinde commented Jul 9, 2020

Uh oh!

rytaft left a comment

Choose a reason for hiding this comment

Uh oh!

DrewKimball left a comment

Choose a reason for hiding this comment

Uh oh!

DrewKimball left a comment

Choose a reason for hiding this comment

Uh oh!

rytaft left a comment

Choose a reason for hiding this comment

Uh oh!

DrewKimball left a comment

Choose a reason for hiding this comment

Uh oh!

rytaft left a comment

Choose a reason for hiding this comment

Uh oh!

awoods187 commented Jul 14, 2020

Uh oh!

RaduBerinde left a comment

Choose a reason for hiding this comment

Uh oh!

DrewKimball left a comment

Choose a reason for hiding this comment

Uh oh!

DrewKimball commented Jul 18, 2020

DrewKimball commented Jul 8, 2020 •

edited

Loading

RaduBerinde commented Jul 9, 2020 •

edited

Loading