Skip to content

opt: refactor and test lookup join key column and expr generation#79911

Merged
craig[bot] merged 8 commits intocockroachdb:masterfrom
mgartner:lookup-join-constraint-builder
Apr 29, 2022
Merged

opt: refactor and test lookup join key column and expr generation#79911
craig[bot] merged 8 commits intocockroachdb:masterfrom
mgartner:lookup-join-constraint-builder

Conversation

@mgartner
Copy link
Copy Markdown
Contributor

@mgartner mgartner commented Apr 13, 2022

opt: simplify fetching outer column in CustomFuncs.findComputedColJoinEquality

Previously, CustomFuncs.findComputedColJoinEquality used
CustomFuncs.OuterCols to retrieve the outer columns of computed column
expressions. CustomFuncs.OuterCols returns the cached outer columns in
the expression if it is a memo.ScalarPropsExpr, and falls back to
calculating the outer columns with memo.BuildSharedProps otherwise.
Computed column expressions are never memo.ScalarPropsExprs, so we use
just use memo.BuildSharedProps directly.

Release note: None

opt: make RemapCols a method on Factory instead of CustomFuncs

Release note: None

opt: use partial-index-reduced filters when building lookup expressions

This commit makes a minor change to generateLookupJoinsImpl.
Previously, equality filters were extracted from the original ON
filters. Now they are extracted from filters that have been reduced by
partial index implication. This has no effect on behavior because
equality filters that reference columns in two tables cannot exist in
partial index predicates, so they will never be eliminated during
partial index implication.

Release note: None

opt: moves some lookup join generation logic to lookup join package

This commit adds a new lookupjoin package. Logic for determining the
key columns and lookup expressions for lookup joins has been moved to
lookupJoin.ConstraintBuilder. The code was moved with as few changes
as possible, and the behavior does not change in any way. This move will
make it easier to test this code in isolation in the future, and allow
for further refactoring.

Release note: None

opt: generalize lookupjoin.ConstraintBuilder API

This commit makes the lookupjoin.ConstraintBuilder API more general to
make unit testing easier in a future commit.

Release note: None

opt: add data-driven tests for lookupjoin.ConstraintBuilder

Release note: None

opt: add lookupjoin.Constraint struct

The lookupjoin.Constraint struct has been added to encapsulate
multiple data structures that represent a strategy for constraining a
lookup join.

Release note: None

@mgartner mgartner requested review from a team, msirek and rytaft April 13, 2022 18:54
@mgartner mgartner requested a review from a team as a code owner April 13, 2022 18:54
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@mgartner mgartner force-pushed the lookup-join-constraint-builder branch 3 times, most recently from c9f03c8 to bb70471 Compare April 13, 2022 20:34
Copy link
Copy Markdown
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent refactoring and new unit tests!
:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/sql/opt/lookupjoin/constraint_builder.go, line 80 at r6 (raw file):

// return values:
//
//   keyCols          - A list of columns to be used as lookup join key columns.

nit: maybe indicate these are left table columns and that this is an ordered list, with columns corresponding the those in rightSideCols.


pkg/sql/opt/lookupjoin/constraint_builder_test.go, line 53 at r6 (raw file):

//       Information about the left columns.
//
//     - index=(<column> [asc|desc], ...)

Since this supports [asc|desc] ordering, maybe a test with a range condition on a descending column would be useful.

Copy link
Copy Markdown
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

Reviewed 1 of 1 files at r1, 5 of 5 files at r2, 1 of 1 files at r3, 8 of 8 files at r4, 3 of 3 files at r5, 5 of 5 files at r6, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)


pkg/sql/opt/lookupjoin/constraint_builder.go, line 85 at r4 (raw file):

// both. If both keyCols and lookupExpr are nil, then the index cannot be used
// for a lookup join.
func (b *ConstraintBuilder) Build(

Any chance you can split this into two commits that will be squashed before merging? One that literally just moves code with no changes from the original lookup join code, and one that makes the remaining changes?


pkg/sql/opt/lookupjoin/constraint_builder.go, line 92 at r4 (raw file):

	inputProjections memo.ProjectionsExpr,
	constFilters memo.FiltersExpr,
	rightSideCols opt.ColList,

Do you think it's worth packaging this stuff into a new struct, lookupjoin.Constraint, or something like that? You could also save that for a future PR...


pkg/sql/opt/xform/join_funcs.go, line 374 at r4 (raw file):

		keyCols, lookupExpr, inputProjections, constFilters, rightSideCols :=
			cb.Build(index, onFilters, optionalFilters)
		if len(keyCols) == 0 && len(lookupExpr) == 0 {

If you make a new struct for lookupjoin.Constraint, this check could become a method on that struct.

Copy link
Copy Markdown
Contributor Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/sql/opt/lookupjoin/constraint_builder.go, line 85 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Any chance you can split this into two commits that will be squashed before merging? One that literally just moves code with no changes from the original lookup join code, and one that makes the remaining changes?

I really tried to make no changes to the logic here. The logic inside Build is the same as the logic removed in generateLookupJoinsImpl, except for referencing the ConstraintBuilder fields rather than CustomFuncs fields and some additional comments. There was an intertwined section of code that created a memo.LookupJoin that I did not move here. All the helper functions are the same, except for the receiver type. I'm not really sure how to move the logic in a way that makes few changes, but I know we've been bitten by moving code in the past. Did you have a specific idea in mind? I suppose I could try doing the work in these N commits:

  1. Move the core key column and lookupExpr logic into a function rather than a struct method into a separate file in the xform package. Move the helper function to the separate file too.
  2. Add the ConstraintBuilder struct, adding Init, Build, and turning the helpers into methods of that struct.
  3. Move the ConstraintBuilder to a separate package.

pkg/sql/opt/lookupjoin/constraint_builder.go, line 92 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Do you think it's worth packaging this stuff into a new struct, lookupjoin.Constraint, or something like that? You could also save that for a future PR...

Yes, there's lots of refactoring I want to do here. The first step was to extract it before changing too much.


pkg/sql/opt/xform/join_funcs.go, line 374 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

If you make a new struct for lookupjoin.Constraint, this check could become a method on that struct.

Good idea. I'll explore that in a future PR.

Copy link
Copy Markdown
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)


pkg/sql/opt/lookupjoin/constraint_builder.go line 85 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I really tried to make no changes to the logic here. The logic inside Build is the same as the logic removed in generateLookupJoinsImpl, except for referencing the ConstraintBuilder fields rather than CustomFuncs fields and some additional comments. There was an intertwined section of code that created a memo.LookupJoin that I did not move here. All the helper functions are the same, except for the receiver type. I'm not really sure how to move the logic in a way that makes few changes, but I know we've been bitten by moving code in the past. Did you have a specific idea in mind? I suppose I could try doing the work in these N commits:

  1. Move the core key column and lookupExpr logic into a function rather than a struct method into a separate file in the xform package. Move the helper function to the separate file too.
  2. Add the ConstraintBuilder struct, adding Init, Build, and turning the helpers into methods of that struct.
  3. Move the ConstraintBuilder to a separate package.

I'm suggesting that one of the commits should be literally just copy-paste with no changes, so it will not even compile, and is only for the purpose of making reviewing easier -- it will need to be squashed with the following commit before merging.

I think this is only needed for cases like this where there are large chunks of code moving, and as a reviewer, you want to be able to see the small changes that were made.

Does that make sense?

…nEquality

Previously, `CustomFuncs.findComputedColJoinEquality` used
`CustomFuncs.OuterCols` to retrieve the outer columns of computed column
expressions. `CustomFuncs.OuterCols` returns the cached outer columns in
the expression if it is a `memo.ScalarPropsExpr`, and falls back to
calculating the outer columns with `memo.BuildSharedProps` otherwise.
Computed column expressions are never `memo.ScalarPropsExpr`s, so we use
just use `memo.BuildSharedProps` directly.

Release note: None
This commit makes a minor change to `generateLookupJoinsImpl`.
Previously, equality filters were extracted from the original `ON`
filters. Now they are extracted from filters that have been reduced by
partial index implication. This has no effect on behavior because
equality filters that reference columns in two tables cannot exist in
partial index predicates, so they will never be eliminated during
partial index implication.

Release note: None
@mgartner mgartner force-pushed the lookup-join-constraint-builder branch from bb70471 to f278dd3 Compare April 27, 2022 16:23
This commit will be squashed before merging. It copies code verbatim
from join_funcs.go to lookupjoin/constraint_builder.go to make it easy
to review the changes made when moving code in the following commit.
@mgartner mgartner force-pushed the lookup-join-constraint-builder branch from f278dd3 to 3a54649 Compare April 27, 2022 16:28
This commit adds a new `lookupjoin` package. Logic for determining the
key columns and lookup expressions for lookup joins has been moved to
`lookupJoin.ConstraintBuilder`. The code was moved with as few changes
as possible, and the behavior does not change in any way. This move will
make it easier to test this code in isolation in the future, and allow
for further refactoring.

Release note: None
This commit makes the lookupjoin.ConstraintBuilder API more general to
make unit testing easier in a future commit.

Release note: None
@mgartner mgartner force-pushed the lookup-join-constraint-builder branch from 3a54649 to ccaa88a Compare April 27, 2022 17:18
Copy link
Copy Markdown
Contributor Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new lookupjoin.Constraint struct in the last commit.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @msirek and @rytaft)


pkg/sql/opt/lookupjoin/constraint_builder.go line 85 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I'm suggesting that one of the commits should be literally just copy-paste with no changes, so it will not even compile, and is only for the purpose of making reviewing easier -- it will need to be squashed with the following commit before merging.

I think this is only needed for cases like this where there are large chunks of code moving, and as a reviewer, you want to be able to see the small changes that were made.

Does that make sense?

I see. I wasn't thinking of making a commit that doesn't compile. The diff between the SQUASH BEFORE MERGING commit and the opt: moves some lookup join generation logic to lookup join package commit should be an easier read now.


pkg/sql/opt/lookupjoin/constraint_builder.go line 80 at r6 (raw file):

Previously, msirek (Mark Sirek) wrote…

nit: maybe indicate these are left table columns and that this is an ordered list, with columns corresponding the those in rightSideCols.

Done.


pkg/sql/opt/lookupjoin/constraint_builder_test.go line 53 at r6 (raw file):

Previously, msirek (Mark Sirek) wrote…

Since this supports [asc|desc] ordering, maybe a test with a range condition on a descending column would be useful.

I don't think it really matters for this lookup constraint code but I added one test for thoroughness.

@mgartner mgartner requested review from a team and cucaroach April 27, 2022 17:47
@mgartner mgartner force-pushed the lookup-join-constraint-builder branch from ccaa88a to 82c05bd Compare April 27, 2022 20:01
Copy link
Copy Markdown
Contributor

@cucaroach cucaroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 16 of 16 files at r7, 5 of 5 files at r8, 1 of 1 files at r9, 2 of 2 files at r10, 9 of 9 files at r11, 3 of 3 files at r12, 6 of 6 files at r13, 1 of 3 files at r14.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach, @msirek, and @rytaft)

@mgartner
Copy link
Copy Markdown
Contributor Author

Friendly ping @rytaft. If it's still too hard to review or too risky of a move, let me know and I'll take another shot at it.

Copy link
Copy Markdown
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay! :lgtm:

Reviewed 16 of 16 files at r7, 4 of 5 files at r8, 9 of 9 files at r11, 3 of 3 files at r12, 6 of 6 files at r13, 3 of 3 files at r14, 1 of 1 files at r15, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @msirek)


pkg/sql/opt/lookupjoin/constraint_builder.go line 85 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I see. I wasn't thinking of making a commit that doesn't compile. The diff between the SQUASH BEFORE MERGING commit and the opt: moves some lookup join generation logic to lookup join package commit should be an easier read now.

Thanks!


pkg/sql/opt/lookupjoin/constraint_builder.go line 41 at r15 (raw file):

	// KeyCols is an ordered list of columns from the left side of the join to
	// be used as lookup join key columns. This list corresponds to the columns
	// in RightSideCols. It will be nil if LookupExpr is nil.

nit: if LookupExpr is non-nil

The `lookupjoin.Constraint` struct has been added to encapsulate
multiple data structures that represent a strategy for constraining a
lookup join.

Release note: None
@mgartner mgartner force-pushed the lookup-join-constraint-builder branch from 82c05bd to 6f3230e Compare April 29, 2022 14:23
Copy link
Copy Markdown
Contributor Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @cucaroach, @msirek, and @rytaft)


pkg/sql/opt/lookupjoin/constraint_builder.go line 41 at r15 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: if LookupExpr is non-nil

Donne.

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Apr 29, 2022

Build succeeded:

@craig craig bot merged commit 0777ef7 into cockroachdb:master Apr 29, 2022
@mgartner mgartner deleted the lookup-join-constraint-builder branch April 29, 2022 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants