Skip to content

opt: fix wrong results due to sorting 1st join of paired join#89840

Closed
msirek wants to merge 2 commits intocockroachdb:masterfrom
msirek:89603
Closed

opt: fix wrong results due to sorting 1st join of paired join#89840
msirek wants to merge 2 commits intocockroachdb:masterfrom
msirek:89603

Conversation

@msirek
Copy link
Copy Markdown
Contributor

@msirek msirek commented Oct 12, 2022

Fixes #89603

This fixes an issue where an illegal paired join plan is created by the
optimizer where the first join in the pair is sorted. This breaks the
assumption required in paired joiner logic that the first join in the
pair is never sorted or distributed for proper interpretation of the
continuation column:

// ContinuationCol is the column ID of the continuation column when
// IsFirstJoinInPairedJoiner is true. The continuation column is a
// boolean column that indicates whether an output row is a
// continuation of a group corresponding to a single left input row.
ContinuationCol opt.ColumnID

Function lookupJoinCanProvideOrdering allows the required ordering to
be passed to the child of the first join in a paired join if only the
input columns are projected (no lookup columns). This is not sufficient
because only when the first join can further pass the required ordering
on to its input can a sort be avoided.

The solution is to modify lookupJoinCanProvideOrdering to indicate the
second join in a paired join can provide an ordering if both the first
join and the first join's child can provide the ordering.

Release note (bug fix): This patch fixes possible, but rare, incorrect
results from paired lookup joins where the first join in the pair is
sorted.

xform: assertion for unenforcible props

This commit adds an assertion that a sort on the first join of a paired
join is never enforced.

Fixes #89603

Release note: None

@msirek msirek requested a review from a team as a code owner October 12, 2022 16:46
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Fixes cockroachdb#89603

This fixes an issue where an illegal paired join plan is created by the
optimizer where the first join in the pair is sorted. This breaks the
assumption required in paired joiner logic that the first join in the
pair is never sorted or distributed for proper interpretation of the
continuation column:
```
// ContinuationCol is the column ID of the continuation column when
// IsFirstJoinInPairedJoiner is true. The continuation column is a
// boolean column that indicates whether an output row is a
// continuation of a group corresponding to a single left input row.
ContinuationCol opt.ColumnID
```

Function `lookupJoinCanProvideOrdering` allows the required ordering to
be passed to the child of the first join in a paired join if only the
input columns are projected (no lookup columns). This is not sufficient
because only when the first join can further pass the required ordering
on to its input can a sort be avoided.

The solution is to modify `lookupJoinCanProvideOrdering` to indicate the
second join in a paired join can provide an ordering if both the first
join and the first join's child can provide the ordering.

Release note (bug fix): This patch fixes possible, but rare, incorrect
results from paired lookup joins where the first join in the pair is
sorted.
@msirek msirek force-pushed the 89603 branch 2 times, most recently from fa0972b to 6b57200 Compare October 12, 2022 17:57
Copy link
Copy Markdown
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that should be my last push. Just making sure the sanity check works.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @mgartner)

@msirek msirek requested a review from sumeerbhola October 12, 2022 18:13
This commit adds an assertion that a sort on the first join of a paired
join is never enforced.

Fixes cockroachdb#89603

Release note: None
Copy link
Copy Markdown
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced this will actually solve the problem (or at least that the test will verify it does). We should probably use the memo format for the test. I'm suspicious of #84689 - I think maybe we need to add a !lookupJoin.IsSecondJoinInPairedJoiner check to the branch I added.

Reviewed 2 of 2 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @msirek, and @sumeerbhola)


pkg/sql/opt/xform/optimizer.go line 589 at r2 (raw file):

	// Was a required physical property applied which is unenforcible on the group
	// member and might do something bad like require an illegal sort?
	o.assertEnforcibleProps(member, required)

Seems like the overhead for this could be non-negligible, should it be a test-only check?


pkg/sql/opt/xform/testdata/rules/join line 12944 at r1 (raw file):

# The sort must be done as the last step instead of in between the first and
# second joins of the paired join.
opt set=testing_optimizer_random_seed=4057832385546395198 set=testing_optimizer_cost_perturbation=1.0

[nit] could probably use a less verbose format here.

Copy link
Copy Markdown
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried reverting #84689, and the problem is still there, so I don't think it's related. Ideally there would be a way to not even optimize an enforcer. Right now we're just sort of relying on the sort operation to not get chosen because the input can provide the ordering. But as long as we're adding the enforcer to the memo, it might get chosen if costs are low. I was trying some other fixes to avoid optimizing the enforcer at all for the first join in the pair, but that seemed like a risky change.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @sumeerbhola)

@DrewKimball
Copy link
Copy Markdown
Collaborator

DrewKimball commented Oct 12, 2022

How does the (lookupJoin.IsFirstJoinInPairedJoiner || lookupJoin.IsSecondJoinInPairedJoiner) prevent cases where we plan a sort on top of the first paired join? Any time we require an ordering, we call optimizeEnforcer with an enforcer added, so we don't really have a way to only propagate the child ordering.

Also, this has me thinking we should add a !(lookupJoin.IsFirstJoinInPairedJoiner || lookupJoin.IsSecondJoinInPairedJoiner) check to the canProjectUsingOnlyInputCols case.

@DrewKimball
Copy link
Copy Markdown
Collaborator

What are the options for preventing this? Maybe it's too hacky, but one thing we could try would be to cost a sort on top of the first join in a paired join with hugeCost. Another solution could be to just check for this specific case in the optimizer somewhere (sounds like you tried this?). Finally, we could have an optimizer-only expression that wraps paired joins to prevent cases like this, since treating them as independent has lead to a few bugs already

Copy link
Copy Markdown
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't prevent it entirely I think. It just makes selection of the optimized expression without sort much more likely due to costs. Like I said above, I was trying a fix to avoid optimizing the SortExpr entirely, but it's risky. Maybe we could do something like what's done in Builder.buildDistribute where a NoOpDistribution is removed entirely. If we have a No-Op sort, would it be safe to remove it, I wonder? Are there any query plans created that rely on a sort taking place, and removing the sort could cause problems?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @sumeerbhola)

Copy link
Copy Markdown
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering the hugeCost solution, but we'd have to make sure that kicks in even when fuzzing costs and that fuzzed costs of other expressions are never greater than hugeCost. Yeah, I'm not sure if I like that or not, since there's still an illegal group member in the memo. I will take a look at the #84689 code too and see if we should try to play it safe with paired joins. Even if we disable some optimizations, we could always re-enable them later after more testing.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @sumeerbhola)

@DrewKimball
Copy link
Copy Markdown
Collaborator

I think I'm leaning toward unifying paired lookup joins into one expression during optimization - that seems like it should solve problems of this nature for certain

Copy link
Copy Markdown
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I might let this ruminate a bit; looking at something else this afternoon.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @sumeerbhola)

@msirek
Copy link
Copy Markdown
Contributor Author

msirek commented Nov 29, 2022

Closing in favor of #92632

@msirek msirek closed this Nov 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql: incorrect results due to sort in-between paired joins

3 participants