opt/props: use EquivGroups in FuncDepSet for tracking equivalences#137571
opt/props: use EquivGroups in FuncDepSet for tracking equivalences#137571craig[bot] merged 4 commits intocockroachdb:masterfrom
Conversation
|
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
|
Continuation of #117272. First commit is #137558. Here's an updated slow-queries benchmark against master: |
mgartner
left a comment
There was a problem hiding this comment.
Reviewed 6 of 6 files at r2, 3 of 3 files at r3, 4 of 4 files at r4, 60 of 60 files at r5, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)
pkg/sql/opt/props/equiv_set.go line 37 at r4 (raw file):
if buildutil.CrdbTestBuild { defer eq.verify() }
nit: Methods that don't mutate the EquivGroup probably don't need to verify the set, do they? Or is this to catch cases where the user incorrectly mutates a ColSet returned from e.g. Group without copying it? In that case, it might be worth a comment mentioning that—Maybe a single comment on verify instead of comments throughout? Up to you.
pkg/sql/opt/props/equiv_set.go line 62 at r4 (raw file):
// GroupForCol returns the group of columns equivalent to the given column. It // returns the empty set if no such group exists. The returned should not be
nit: "The returned should.." => "The returned ColSet should..."
pkg/sql/opt/props/equiv_set.go line 103 at r4 (raw file):
} if eq.groups[i].Contains(right) { return eq.groups[i].Contains(left)
nit: This can be return false since we already checked if the group contains left.
pkg/sql/opt/props/equiv_set.go line 251 at r4 (raw file):
// * In the example, the (1-3) group is split into (1) and (2,3). Since the (1) // group only has a single column, it is discarded as a trivial equivalence. // * The (4-8) group is split into (4,8) and (5,6). Since both subsets have at
nit: (4,8) => (4,7,8)
pkg/sql/opt/props/equiv_set.go line 286 at r4 (raw file):
// AppendFromDisjoint unions the equiv groups from the given EquivGroups with // this one, assuming the groups are disjoint. func (eq *EquivGroups) AppendFromDisjoint(other *EquivGroups) {
nit: make the comment louder about the fact that the groups must be disjoint.
pkg/sql/opt/props/func_dep.go line 648 at r5 (raw file):
for i, ok := cols.Next(0); ok; i, ok = cols.Next(i + 1) { // First check if the column is present in any "to" set of a dependency. // If not, then it is not redundant and must remain in the set. This is
nit: Update this comment to mention the equiv groups
pkg/sql/opt/props/func_dep.go line 1690 at r5 (raw file):
needComma = true from := opt.MakeColSet(col) fmt.Fprintf(b, "%s==%s", from, group.Difference(from))
This verbose formatting of equiv sets is now vestigial. I think it's good to not create extra test output churn in this commit, but maybe we should consider a new format for equivalencies in the near future, like:
fd: (1)-->(2,3), ==(4,5), ==(9-11)
pkg/sql/opt/props/func_dep.go line 1821 at r5 (raw file):
// Non-constant FDs are weaker than equivalence constraints. if f.equiv.AreAllColsEquiv(to.Union(from)) {
Something to consider if we see a lot of allocations here: it may be helpful to have a 2-set version of AreAllColsEquiv to avoid Union.
pkg/sql/opt/props/func_dep.go line 1889 at r5 (raw file):
break } }
nit: In a subsequent PR/commit, consider returning the added equiv colset from AddNoCopy so we don't have to search for it here.
pkg/sql/opt/props/equiv_set_test.go line 73 at r3 (raw file):
} var equivSet EquivGroups
nit: rename equivSet in the renaming commit.
pkg/sql/opt/props/equiv_set.go line 187 at r5 (raw file):
} for i := range fdset.equiv.groups { // No copy is necessary because the equiv groups are immutable.
Is it really "necessary", or just "safe"?
This commit renames `EquivSet` to `EquivGroups` to better reflect the fact that it tracks _groups_ of equivalent groups, rather than a single set of equivalent columns. Epic: None Release note: None
This commit removes the inline buffer from `EquivGroups`, as well as the `NewEquivGroups()` that was previously used to set up the buffer. This simplifies usage, and doesn't appear to affect performance when the set is used in `JoinOrderBuilder` or in `FuncDepSet`. Informs cockroachdb#83963 Release note: None
This commit adds several new methods along with unit tests to `EquivGroups` to prepare its use in tracking equivalencies in `FuncDepSet`. Informs cockroachdb#83963 Release note: None
695c02a to
b784dfe
Compare
DrewKimball
left a comment
There was a problem hiding this comment.
TFTR!
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/opt/props/equiv_set.go line 37 at r4 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: Methods that don't mutate the EquivGroup probably don't need to verify the set, do they? Or is this to catch cases where the user incorrectly mutates a ColSet returned from e.g.
Groupwithout copying it? In that case, it might be worth a comment mentioning that—Maybe a single comment onverifyinstead of comments throughout? Up to you.
Yes, that's exactly why. Done.
pkg/sql/opt/props/equiv_set.go line 62 at r4 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: "The returned should.." => "The returned ColSet should..."
Done.
pkg/sql/opt/props/equiv_set.go line 103 at r4 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: This can be
return falsesince we already checked if the group containsleft.
Oh, good point. I ended up restructuring it to check containsLeft || containsRight, and then just return containsLeft && containsRight.
pkg/sql/opt/props/equiv_set.go line 251 at r4 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit:
(4,8)=>(4,7,8)
Done. Good catch
pkg/sql/opt/props/equiv_set.go line 286 at r4 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: make the comment louder about the fact that the groups must be disjoint.
Done.
pkg/sql/opt/props/equiv_set.go line 187 at r5 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Is it really "necessary", or just "safe"?
Confusing phrasing. I just meant we don't have to copy in order to be safe. I changed the wording a bit.
pkg/sql/opt/props/func_dep.go line 648 at r5 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: Update this comment to mention the equiv groups
Done.
pkg/sql/opt/props/func_dep.go line 1690 at r5 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
This verbose formatting of equiv sets is now vestigial. I think it's good to not create extra test output churn in this commit, but maybe we should consider a new format for equivalencies in the near future, like:
fd: (1)-->(2,3), ==(4,5), ==(9-11)
Yeah, I like that. Or maybe like this:
fd: (1)-->(2,3) equiv: (4,5), (9-11)
pkg/sql/opt/props/func_dep.go line 1821 at r5 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Something to consider if we see a lot of allocations here: it may be helpful to have a 2-set version of
AreAllColsEquivto avoidUnion.
Good idea. I wonder if there are any other places where that could help. Do you think it's worth leaving a TODO?
pkg/sql/opt/props/func_dep.go line 1889 at r5 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: In a subsequent PR/commit, consider returning the added equiv colset from
AddNoCopyso we don't have to search for it here.
Good idea. I added it to the commit that introduces the new methods and tests.
pkg/sql/opt/props/equiv_set_test.go line 73 at r3 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: rename
equivSetin the renaming commit.
Done.
b784dfe to
3aff919
Compare
mgartner
left a comment
There was a problem hiding this comment.
Reviewed 1 of 3 files at r7, 3 of 4 files at r8, 63 of 63 files at r9, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)
pkg/sql/opt/props/func_dep.go line 1821 at r5 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Good idea. I wonder if there are any other places where that could help. Do you think it's worth leaving a TODO?
Probably not necessary.
Previously, `FuncDepSet` was fairly wasteful in how it tracked sets of equivalent columns: for each column in the equiv group, an FD was maintained from that column to all other columns in the group. This meant that there were `2n` `ColSets` for each equiv group (where `n` is the number of columns in the group). This patch modifies `FuncDepSet` and its internals to use `props.EquivGroups` instead, which keeps a single `ColSet` for each equiv group. This significantly cuts down on allocations for queries with many columns and equalities, both because less `ColSets` spill to heap, and because less FDs are added to the `deps` slice. Fixes cockroachdb#83963 Release note: None
3aff919 to
15b7d7e
Compare
|
TFTR! bors r+ |
props: don't leave unmerged equiv groups in EquivSet
This commit fixes a bug in
EquivSet.tryMergeGroups, which could causean invariant (that equiv groups are non-intersecting) to be violated.
This would potentially cause re-ordered joins to have redundant equality
filters, and prevent join filter push-down in rare cases. This commit
also adds verification to the set for test builds, and a unit test for
the
Addmethod.Fixes #137381
Release note: None
props: rename EquivSet to EquivGroups
This commit renames
EquivSettoEquivGroupsto better reflect thefact that it tracks groups of equivalent groups, rather than a single
set of equivalent columns.
Epic: None
Release note: None
props: remove initial buffer from EquivGroups
This commit removes the inline buffer from
EquivGroups, as well as theNewEquivGroups()that was previously used to set up the buffer. Thissimplifies usage, and doesn't appear to affect performance when the
set is used in
JoinOrderBuilderor inFuncDepSet.Informs #83963
Release note: None
props: add methods to EquivGroups for use in FuncDepSet
This commit adds several new methods along with unit tests to
EquivGroupsto prepare its use in tracking equivalencies in
FuncDepSet.Informs #83963
Release note: None
opt/props: use EquivGroups in FuncDepSet for tracking equivalences
Previously,
FuncDepSetwas fairly wasteful in how it tracked sets ofequivalent columns: for each column in the equiv group, an FD was maintained
from that column to all other columns in the group. This meant that there
were
2nColSetsfor each equiv group (wherenis the number of columnsin the group).
This patch modifies
FuncDepSetand its internals to useprops.EquivGroupsinstead, which keeps a single
ColSetfor each equiv group. This significantlycuts down on allocations for queries with many columns and equalities, both
because less
ColSetsspill to heap, and because less FDs are added tothe
depsslice.Fixes #83963
Release note: None