Skip to content

opt: add rule to replace outer cols with equivalent non-outer cols#92378

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
DrewKimball:decorrelate
Jan 27, 2023
Merged

opt: add rule to replace outer cols with equivalent non-outer cols#92378
craig[bot] merged 1 commit intocockroachdb:masterfrom
DrewKimball:decorrelate

Conversation

@DrewKimball
Copy link
Copy Markdown
Collaborator

This commit adds three decorrelation rules, TryRemapJoinOuterColsRight, TryRemapJoinOuterColsLeft, and TryRemapSelectOuterCols. These rules match joins and selects that have a correlated input and an equality filter between an outer and a non-outer column from the input. Upon matching, the rules traverse the input as far as it would be valid to push the equality filter through normal push-down rules, and replace any encountered references to the equivalent outer column. This approach avoids interactions with rules like TryDecorrelateSelect, which attempt to pull filters up the operator tree when correlation is present.

Fixes #88885

Release note: None

@DrewKimball DrewKimball requested a review from a team as a code owner November 23, 2022 08:28
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@DrewKimball
Copy link
Copy Markdown
Collaborator Author

Hold off on reviewing - I've got some test changes to investigate.

@DrewKimball DrewKimball added the do-not-merge bors won't merge a PR with this label. label Nov 23, 2022
@DrewKimball DrewKimball removed the do-not-merge bors won't merge a PR with this label. label Nov 30, 2022
@DrewKimball
Copy link
Copy Markdown
Collaborator Author

This is RFAL. The plan for TPCH Q2 changed and is slower; this seems to be because decorrelation rules no longer pull the groupby above the apply-join. Stats end up being further off in some additional cases, and the optimizer chooses a (buffering) sort + limit instead of a top-k operator because it believes the sort only receives one row as input. I think this is likely because the groupby makes stats estimation more difficult for the operators above it.

@DrewKimball DrewKimball requested review from a team and removed request for a team December 1, 2022 01:21
Copy link
Copy Markdown
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @msirek)


pkg/bench/rttanalysis/orm_queries_bench_test.go line 319 at r3 (raw file):

		{
			Name: "hasura column descriptions 8 tables",

in #92929 the preceding test case was also skipped, so could you unskip it when this PR is good to go?

Copy link
Copy Markdown
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @msirek, and @rafiss)


pkg/bench/rttanalysis/orm_queries_bench_test.go line 319 at r3 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

in #92929 the preceding test case was also skipped, so could you unskip it when this PR is good to go?

Done.

Copy link
Copy Markdown
Contributor

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @msirek, and @rafiss)


pkg/sql/opt/norm/decorrelate_funcs.go line 1072 at r4 (raw file):

// valid to push down a filter on those non-outer columns. If a reference to the
// outer column is discovered during this traversal, it is valid to replace it
// with one of the non-outer columns in the set.

Is there a reason that we can't do a single traversal and replace any of the outer columns we find?


pkg/sql/opt/norm/decorrelate_funcs.go line 1096 at r4 (raw file):

					// This outer-column reference can be remapped.
					succeeded = true
					return c.f.ConstructVariable(replaceCol)

Do we need any logic to prevent making a substitution that actually drops a filter? For example, how do we prevent replacing the expression non_outer_col = outer_col with non_outer_col = non_outer_col?

Copy link
Copy Markdown
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @msirek, and @rafiss)


pkg/sql/opt/norm/decorrelate_funcs.go line 1072 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Is there a reason that we can't do a single traversal and replace any of the outer columns we find?

Hm, that might be possible with an EquivSet. I'll check it out.


pkg/sql/opt/norm/decorrelate_funcs.go line 1096 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Do we need any logic to prevent making a substitution that actually drops a filter? For example, how do we prevent replacing the expression non_outer_col = outer_col with non_outer_col = non_outer_col?

Do you mean a case where there are two non_outer_col = outer_col filters to start with, one above the other in the tree?

Copy link
Copy Markdown
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @msirek, and @rafiss)


pkg/sql/opt/norm/decorrelate_funcs.go line 1072 at r4 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Hm, that might be possible with an EquivSet. I'll check it out.

Update: it should be possible, but would introduce some complexity, since we can no longer restrict the set of columns we're considering to just non-outer columns - we would also need to keep track of which outer columns are equal to which non-outer columns, but only those outer columns that were "outer" in the original matched expression. Additionally, it would require more allocations since ColSet copies would become slice copies.
So my feeling is that if the query is complex enough to make the number of traversals significant, doing a single traversal would cause the number of allocations to be significant. Do you have any strong opinions about this?

Copy link
Copy Markdown
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I tried to find problem cases involving composite sensitive columns of different data types, but these seem to not be comparable with each other in an IN or NOT IN subquery clause, so that seems to prevent the introduction of column comparisons of data types which didn't exist in the un-rewritten query, for example, float with int, which didn't exist before. For example:

create table t5 (a float, b dec);
create table t6 (a float, b float);
select * from t6 where (b) in (select a from t5 where t5.b = t6.b);

This derives a t5.a = t5.b term, but since t6.b and t5.a are equivalent, and of the same data type (in the IN clause), we're not introducing a new term comparing different data types.

Honestly, it's a little difficult to tell if any problem cases are possible just from examining the code. I wonder if it would make sense to put in a sanity check that the type family of the column we're mapping to matches that of the "from" column.

For the TPCH Q2 regression, maybe it would be good to open an issue for that. These new decorrelation rules should have a very positive impact on other queries so a regression here or there seems acceptable and may highlight other issues which need addressing (like stats accuracy of groupby, as you mentioned). One alternative solution could be to derive new terms instead of just mapping them, and link them. When one term in the pair is applied, mark the other as not requiring evaluation. This could help prevent some beneficial rewrite rules from no longer being able to fire after these rewrites. But maybe such an approach is too complex and could lead to more stats inaccuracies due to redundant terms. Just throwing it out there to ponder.

You may wish to get at least one more approval due to the complexity of this PR, but it looks good to me.

Reviewed 3 of 3 files at r1, 7 of 7 files at r2, 1 of 2 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @rafiss)


pkg/sql/opt/norm/decorrelate_funcs.go line 1096 at r4 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Do you mean a case where there are two non_outer_col = outer_col filters to start with, one above the other in the tree?

Maybe the question being asked here is, could we ever drop a filter which is still needed? It seems that if the non_outer_col is already in the same equivalence set with outer_col due to IN or NOT IN subquery connecting conditions, it is OK to generate a non_outer_col = non_outer_col filter because the filtering would be done by the IN or NOT IN. This mapped term gets normalized into something like non_outer_col IS DISTINCT FROM CAST(NULL AS INT8).
e.g.

explain select * from t1 where a in (select a from t3 where t3.a = t1.a);
                                          info
----------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • hash join (semi)
  │ estimated row count: 1
  │ equality: (a) = (a)
  │
  ├── • filter
  │   │ estimated row count: 1
  │   │ filter: a IS DISTINCT FROM CAST(NULL AS INT8)
  │   │
  │   └── • scan
  │         estimated row count: 1 (100% of the table; stats collected 51 minutes ago)
  │         table: t1@t1_pkey
  │         spans: FULL SCAN
  │
  └── • filter
      │ estimated row count: 1
      │ filter: a IS DISTINCT FROM CAST(NULL AS INT8)
      │
      └── • scan
            estimated row count: 1 (100% of the table; stats collected 34 minutes ago)
            table: t3@t3_pkey
            spans: FULL SCAN

Copy link
Copy Markdown
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondering if this PR is still planned for 23.1?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @mgartner)

@DrewKimball
Copy link
Copy Markdown
Collaborator Author

just wondering if this PR is still planned for 23.1?

Still planned, I just got distracted by other things for a bit.

Copy link
Copy Markdown
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @msirek, and @rafiss)


pkg/sql/opt/norm/decorrelate_funcs.go line 1072 at r4 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Update: it should be possible, but would introduce some complexity, since we can no longer restrict the set of columns we're considering to just non-outer columns - we would also need to keep track of which outer columns are equal to which non-outer columns, but only those outer columns that were "outer" in the original matched expression. Additionally, it would require more allocations since ColSet copies would become slice copies.
So my feeling is that if the query is complex enough to make the number of traversals significant, doing a single traversal would cause the number of allocations to be significant. Do you have any strong opinions about this?

@mgartner friendly ping on this point - TL;DR is that we could do it in one pass with more allocations.


pkg/sql/opt/norm/decorrelate_funcs.go line 1096 at r4 (raw file):

Previously, msirek (Mark Sirek) wrote…

Maybe the question being asked here is, could we ever drop a filter which is still needed? It seems that if the non_outer_col is already in the same equivalence set with outer_col due to IN or NOT IN subquery connecting conditions, it is OK to generate a non_outer_col = non_outer_col filter because the filtering would be done by the IN or NOT IN. This mapped term gets normalized into something like non_outer_col IS DISTINCT FROM CAST(NULL AS INT8).
e.g.

explain select * from t1 where a in (select a from t3 where t3.a = t1.a);
                                          info
----------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • hash join (semi)
  │ estimated row count: 1
  │ equality: (a) = (a)
  │
  ├── • filter
  │   │ estimated row count: 1
  │   │ filter: a IS DISTINCT FROM CAST(NULL AS INT8)
  │   │
  │   └── • scan
  │         estimated row count: 1 (100% of the table; stats collected 51 minutes ago)
  │         table: t1@t1_pkey
  │         spans: FULL SCAN
  │
  └── • filter
      │ estimated row count: 1
      │ filter: a IS DISTINCT FROM CAST(NULL AS INT8)
      │
      └── • scan
            estimated row count: 1 (100% of the table; stats collected 34 minutes ago)
            table: t3@t3_pkey
            spans: FULL SCAN

As Mark says, we only make the replacement in cases where it doesn't affect the result at the level of the matched filter. I'll add a test case that leads to a column being set equal to itself.

Copy link
Copy Markdown
Contributor

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very cool. Nice work!

Reviewed 1 of 3 files at r1, 4 of 7 files at r2, 1 of 1 files at r4, 7 of 7 files at r5, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @rafiss)


pkg/sql/opt/norm/decorrelate_funcs.go line 1072 at r4 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

@mgartner friendly ping on this point - TL;DR is that we could do it in one pass with more allocations.

Thanks for the explanation. No need to change this.


pkg/sql/opt/norm/decorrelate_funcs.go line 1096 at r4 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

As Mark says, we only make the replacement in cases where it doesn't affect the result at the level of the matched filter. I'll add a test case that leads to a column being set equal to itself.

I see. I think my original concern is void - this would only happen if we had something like an inner filter of inner_col = outer_col and an outer filter of outer_col = inner_col - I don't even know if that's possible with other normalization rules that should eliminate one of those filters, but even if it were, we'd never eliminate the outer filter.


pkg/sql/opt/norm/decorrelate_funcs.go line 1091 at r5 (raw file):

		// candidateCols is the set of input columns for which it may be possible to
		// push a filter constraining the column to be equal to an outer column.
		candidateCols := input.Relational().FuncDeps.ComputeEquivGroup(col)

super nit: Can you think of a better name for candidateCols (and associated functions with "candidate" in the name)? It's mildly confusing because it's unclear if they represent columns that are candidates to be replaced or candidates to replace with. Maybe substituteCols?


pkg/sql/opt/norm/decorrelate_funcs.go line 1095 at r5 (raw file):

			candidateCols = filters[i].ScalarProps().FuncDeps.ComputeEquivClosureNoCopy(candidateCols)
		}
		candidateCols.DifferenceWith(input.Relational().OuterCols)

nit: candidateCols.DifferenceWith(outerCols)


pkg/sql/opt/norm/decorrelate_funcs.go line 1104 at r5 (raw file):

			}
			if v, ok := e.(*memo.VariableExpr); ok && v.Col == col {
				if replaceCol, ok := candidateCols.Next(0); ok {

nit: You could use candidateCols.SingleColumn() here instead and avoid the if.


pkg/sql/opt/norm/decorrelate_funcs.go line 1111 at r5 (raw file):

			}
			switch t := e.(type) {
			case memo.RelExpr:

nit: I think you can mix structs and interfaces in a type switch, for example:

switch t := e.(type) {
case *memo.VariableExpr:
  // ...
case memo.RelExpr:
  // ...
case opt.ScalarExpr:
  // ...
}

pkg/sql/opt/norm/decorrelate_funcs.go line 1125 at r5 (raw file):

			// If we made it here, candidateCols has been modified so that calling the
			// replace function on the current expression is valid.
			return c.f.Replace(e, replaceFn)

Don't we need to reset candidate cols to what it was before we called getCandidateColsRelExpr? For example, if we recurse into the left side of a join and mutate the candidate columns, don't we want to revert the candidate columns when we pop back up and recurse into the right side of a join?


pkg/sql/opt/norm/decorrelate_funcs.go line 1216 at r5 (raw file):

	// both inputs. We filter out non-output columns from candidateCols on
	// each recursive call to TryRemapOuterCols, so we don't need to remove any
	// columns here.

I don't fully understand this last sentence. Don't we need to filter out the SetPrivate.OutCols for the recursion with the replaceFn so that we don't replace an outer column with another outer column (from the perspective of the set's left or right child)? I think a test with replacement in two sides of a setop might reveal this.


pkg/sql/opt/norm/rules/decorrelate.opt line 86 at r5 (raw file):

# decorrelation rules because it does not perform any transformations beyond the
# variable replacement. This prevents situations where decorrelation rules make
# the plan worse in their attempts to decorrelate the query.

nit: elaborating on this last sentence or adding an example might be helpful. Or is it obvious how the plans can become worse, and I'm not seeing it?


pkg/sql/opt/norm/rules/decorrelate.opt line 101 at r5 (raw file):


# TryRemapJoinOuterColsLeft is similar to TryRemapJoinOuterColsRight, but it
# applies to the right input of a join.

nit "right input" => "left input"


pkg/sql/opt/norm/testdata/rules/decorrelate line 6154 at r5 (raw file):

# Case with a LeftJoin. The filter can be mapped to the right input.
norm expect=TryRemapJoinOuterColsRight format=hide-all
SELECT * FROM xy INNER JOIN LATERAL (SELECT * FROM ab LEFT JOIN (SELECT u, v + x FROM uv) ON a = u) ON a = x

Can you add a case like this were the remap cannot happen because the join filter doesn't have an equality filter with a column on the right?

Can you also add a case where both the left and right side of the join should have columns remapped? I think that might expose the need to reset candidateCols (see my previous comment).


pkg/sql/opt/norm/testdata/rules/decorrelate line 6293 at r5 (raw file):

# Case with a Union.
norm expect=TryRemapJoinOuterColsRight format=hide-all
SELECT * FROM xy INNER JOIN LATERAL (SELECT * FROM ab UNION (SELECT u, v+x FROM uv)) ON a = x

Can you also add some setop cases where both the left and right side of the setop should have columns remapped? I think that might expose the need to reset candidateCols (see my previous comment).


pkg/sql/opt/norm/testdata/rules/decorrelate line 6308 at r5 (raw file):

# Case with an Intersect.
norm expect=TryRemapJoinOuterColsRight format=hide-all
SELECT * FROM xy INNER JOIN LATERAL (SELECT * FROM ab Intersect (SELECT u, v+x FROM uv)) ON a = x

nit: Intersect => INTERSECT


pkg/sql/opt/norm/testdata/rules/decorrelate line 6323 at r5 (raw file):

# Case with an Except.
norm expect=TryRemapJoinOuterColsRight format=hide-all
SELECT * FROM xy INNER JOIN LATERAL (SELECT * FROM ab Except (SELECT u, v+x FROM uv)) ON a = x

nit: Except => EXCEPT

Copy link
Copy Markdown
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rafiss)


pkg/sql/opt/norm/decorrelate_funcs.go line 1096 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I see. I think my original concern is void - this would only happen if we had something like an inner filter of inner_col = outer_col and an outer filter of outer_col = inner_col - I don't even know if that's possible with other normalization rules that should eliminate one of those filters, but even if it were, we'd never eliminate the outer filter.

Done.


pkg/sql/opt/norm/decorrelate_funcs.go line 1091 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

super nit: Can you think of a better name for candidateCols (and associated functions with "candidate" in the name)? It's mildly confusing because it's unclear if they represent columns that are candidates to be replaced or candidates to replace with. Maybe substituteCols?

substituteCols sounds good to me. Done.


pkg/sql/opt/norm/decorrelate_funcs.go line 1095 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: candidateCols.DifferenceWith(outerCols)

Done.


pkg/sql/opt/norm/decorrelate_funcs.go line 1104 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: You could use candidateCols.SingleColumn() here instead and avoid the if.

SingleColumn panics if there isn't exactly one column, and there could be multiple possible substitute columns.


pkg/sql/opt/norm/decorrelate_funcs.go line 1111 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: I think you can mix structs and interfaces in a type switch, for example:

switch t := e.(type) {
case *memo.VariableExpr:
  // ...
case memo.RelExpr:
  // ...
case opt.ScalarExpr:
  // ...
}

Nice, done.


pkg/sql/opt/norm/decorrelate_funcs.go line 1125 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Don't we need to reset candidate cols to what it was before we called getCandidateColsRelExpr? For example, if we recurse into the left side of a join and mutate the candidate columns, don't we want to revert the candidate columns when we pop back up and recurse into the right side of a join?

Really good catch. In practice this meant we'd end up firing the rule more often than needed (e.g. once per correlated input of a join/set op). I've refactored the traversal logic so that the new substituteCols gets captured by the replaceFunc definition without also setting the old one.


pkg/sql/opt/norm/decorrelate_funcs.go line 1216 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I don't fully understand this last sentence. Don't we need to filter out the SetPrivate.OutCols for the recursion with the replaceFn so that we don't replace an outer column with another outer column (from the perspective of the set's left or right child)? I think a test with replacement in two sides of a setop might reveal this.

I think this comment was outdated - when getSubstituteColsRelExpr is called for the left input, substituteColumns will be restricted to just the left columns and vice versa for the right input. Additionally, it is possible for OutCols and LeftCols to be the same - say for an Intersect. That being said, I think the clearest thing will be to just construct the new substituteCols from scratch. Let me know what you think.


pkg/sql/opt/norm/rules/decorrelate.opt line 86 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: elaborating on this last sentence or adding an example might be helpful. Or is it obvious how the plans can become worse, and I'm not seeing it?

Added some elaboration, let me know what you think.


pkg/sql/opt/norm/rules/decorrelate.opt line 101 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit "right input" => "left input"

Good catch, done.


pkg/sql/opt/norm/testdata/rules/decorrelate line 6154 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Can you add a case like this were the remap cannot happen because the join filter doesn't have an equality filter with a column on the right?

Can you also add a case where both the left and right side of the join should have columns remapped? I think that might expose the need to reset candidateCols (see my previous comment).

Done.


pkg/sql/opt/norm/testdata/rules/decorrelate line 6293 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Can you also add some setop cases where both the left and right side of the setop should have columns remapped? I think that might expose the need to reset candidateCols (see my previous comment).

Done.


pkg/sql/opt/norm/testdata/rules/decorrelate line 6308 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: Intersect => INTERSECT

Done.


pkg/sql/opt/norm/testdata/rules/decorrelate line 6323 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: Except => EXCEPT

Done.

Copy link
Copy Markdown
Contributor

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r6, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @rafiss)


pkg/sql/opt/norm/decorrelate_funcs.go line 1104 at r5 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

SingleColumn panics if there isn't exactly one column, and there could be multiple possible substitute columns.

Oh right, sorry for the noise.


pkg/sql/opt/norm/decorrelate_funcs.go line 1125 at r5 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Really good catch. In practice this meant we'd end up firing the rule more often than needed (e.g. once per correlated input of a join/set op). I've refactored the traversal logic so that the new substituteCols gets captured by the replaceFunc definition without also setting the old one.

💯


pkg/sql/opt/norm/decorrelate_funcs.go line 1216 at r5 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I think this comment was outdated - when getSubstituteColsRelExpr is called for the left input, substituteColumns will be restricted to just the left columns and vice versa for the right input. Additionally, it is possible for OutCols and LeftCols to be the same - say for an Intersect. That being said, I think the clearest thing will be to just construct the new substituteCols from scratch. Let me know what you think.

LGTM.


pkg/sql/opt/norm/decorrelate_funcs.go line 1133 at r6 (raw file):

		// outer-column remapping within (the children of) a RelExpr. Note that
		// getSubstituteColsRelExpr copies substituteCols before modifying it, so
		// different branches of the traversal don't interact.

nit: "dont' interact on the same column set."


pkg/sql/opt/norm/rules/decorrelate.opt line 86 at r5 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Added some elaboration, let me know what you think.

👍


pkg/sql/opt/norm/testdata/rules/decorrelate line 6343 at r6 (raw file):

 │         ├── scan uv
 │         └── projections
 │              └── v + u

It's tricky to see why it's valid to replace v+x with v+u. I think it might be more obvious if you alias the result of the union and qualify it in the join condition, e.g.:

SELECT * FROM xy INNER JOIN LATERAL (SELECT * FROM (SELECT a, b+x FROM ab) UNION (SELECT u, v+x FROM uv)) AS rhs ON rhs.a = x

You might want to do the same with the other set test cases.

This commit adds three decorrelation rules, `TryRemapJoinOuterColsRight`,
`TryRemapJoinOuterColsLeft`, and `TryRemapSelectOuterCols`. These rules
match joins and selects that have a correlated input and an equality
filter between an outer and a non-outer column from the input. Upon
matching, the rules traverse the input as far as it would be valid to
push the equality filter through normal push-down rules, and replace
any encountered references to the equivalent outer column. This approach
avoids interactions with rules like `TryDecorrelateSelect`, which attempt
to pull filters *up* the operator tree when correlation is present.

Fixes cockroachdb#88885

Release note: None
Copy link
Copy Markdown
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rafiss)


pkg/sql/opt/norm/decorrelate_funcs.go line 1133 at r6 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: "dont' interact on the same column set."

Done.


pkg/sql/opt/norm/testdata/rules/decorrelate line 6343 at r6 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

It's tricky to see why it's valid to replace v+x with v+u. I think it might be more obvious if you alias the result of the union and qualify it in the join condition, e.g.:

SELECT * FROM xy INNER JOIN LATERAL (SELECT * FROM (SELECT a, b+x FROM ab) UNION (SELECT u, v+x FROM uv)) AS rhs ON rhs.a = x

You might want to do the same with the other set test cases.

I added an alias to each of the Set ops foo(a, b) and had the filter reference it. Hopefully that clears things up.

@DrewKimball
Copy link
Copy Markdown
Collaborator Author

Thanks for the thorough reviews!

Github CI failure is a known flake.
bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 27, 2023

Build succeeded:

@craig craig bot merged commit efb4481 into cockroachdb:master Jan 27, 2023
@DrewKimball DrewKimball deleted the decorrelate branch January 27, 2023 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

opt: add rule to replace outer cols with equivalent non-outer cols

5 participants