Skip to content

release-22.1: memo: better selectivity estimates on single-table ORed predicates#94439

Merged
mgartner merged 2 commits intocockroachdb:release-22.1from
mgartner:backport22.1-89358-94389
Dec 30, 2022
Merged

release-22.1: memo: better selectivity estimates on single-table ORed predicates#94439
mgartner merged 2 commits intocockroachdb:release-22.1from
mgartner:backport22.1-89358-94389

Conversation

@mgartner
Copy link
Copy Markdown
Contributor

@mgartner mgartner commented Dec 29, 2022

Backport:

Please see individual PRs for details.

Note that the setting optimizer_use_improved_disjunction_stats defaults to false in
this backport (differing from the original commit) so that users must opt into the new
behavior.

Release justification: This backports improved statistics logic behind a session setting.

/cc @cockroachdb/release

@mgartner mgartner requested a review from a team December 29, 2022 21:38
@mgartner mgartner requested a review from a team as a code owner December 29, 2022 21:38
@mgartner mgartner requested a review from rytaft December 29, 2022 21:38
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Dec 29, 2022

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Patches should only be created for serious issues or test-only changes.
  • Patches should not break backwards-compatibility.
  • Patches should change as little code as possible.
  • Patches should not change on-disk formats or node communication protocols.
  • Patches should not add new functionality.
  • Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

  • What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
  • Will this work in a cluster of mixed patch versions? Did we test that?
  • If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@mgartner
Copy link
Copy Markdown
Contributor Author

Do not review yet. I need to figure out why some of the join stats have changed.

@mgartner mgartner force-pushed the backport22.1-89358-94389 branch 2 times, most recently from 165059d to 1c32595 Compare December 29, 2022 22:09
@mgartner mgartner requested a review from msirek December 29, 2022 22:10
@mgartner
Copy link
Copy Markdown
Contributor Author

Ok, this is ready for review.

mgartner added a commit to mgartner/cockroach that referenced this pull request Dec 29, 2022
These subtle bugs were discovered while creating cockroachdb#94439, the backport
of cockroachdb#89358 and cockroachdb#94389 to v22.1. They have already been included in that
backport. The backport to v22.2 has not yet been created, and it will
include this commit.

Release note: None
Fixes cockroachdb#75090

Currently, the optimizer only does a good job of estimating the
selectivity of single-table ORed predicates when all disjuncts are tight
constraints. Otherwise we give up and go with the default selectivity of
1/3 for the entire filter. If any of the disjuncts has a selectivity
above 1/3, then the total selectivity should be at least as high as the
selectivity of that disjunct. Any failure to find a tight constraint for
a given disjunct should only cause a selectivity of 1/3 to be used for
that disjunct, not the entire filter.

This is addressed by using the method of combining ORed selectivities
introduced by cockroachdb#74303:
```
   Given predicates A and B and probability function P:

      P(A or B) = P(A) + P(B) - P(A and B)
```
The above formula is applied n-1 times, given n disjuncts.
For each disjunct which is not a tight constraint, 1/3 is used for that
predicate's selectivity in the formula.

Release note (bug fix): This patch fixes incorrect selectivity
estimation for queries with ORed predicates all referencing a
common single table.
@mgartner mgartner force-pushed the backport22.1-89358-94389 branch from 1c32595 to 1b9d287 Compare December 29, 2022 22:43
Copy link
Copy Markdown
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 20 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/sql/opt/memo/statistics_builder.go line 3175 at r4 (raw file):

			// not an Or, bump the number of unapplied disjuncts.
			numUnappliedDisjuncts++
			return

It seems like we want the new flag to control the setting of unconstrained and incrementing of numUnappliedDisjuncts instead of just removing the old logic.


pkg/sql/opt/memo/statistics_builder.go line 3194 at r4 (raw file):

	collectConstraints(or.Right)

	return constraints, numUnappliedDisjuncts

Same here. It seems like we want to follow the old behavior when the flag is false. Otherwise we're changing the result, even when the flag is false.

Code quote:

not an Or, bump the number of unapplied disjuncts.

@mgartner
Copy link
Copy Markdown
Contributor Author

pkg/sql/opt/memo/statistics_builder.go line 3175 at r4 (raw file):

Previously, msirek (Mark Sirek) wrote…

It seems like we want the new flag to control the setting of unconstrained and incrementing of numUnappliedDisjuncts instead of just removing the old logic.

Notice the check I added to gate the old logic above: if numUnappliedDisjuncts == 0 {. This matches the old logic which needing to contort the function signature any more. I originally missed this, which is why I created #94442 so these fixes make it back into master and the v22.2 backport.

Copy link
Copy Markdown
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 13 files at r1, 17 of 20 files at r4, 20 of 20 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


-- commits line 34 at r5:
Should this say "defaults to false" ?


pkg/sql/opt/memo/statistics_builder.go line 3175 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Notice the check I added to gate the old logic above: if numUnappliedDisjuncts == 0 {. This matches the old logic which needing to contort the function signature any more. I originally missed this, which is why I created #94442 so these fixes make it back into master and the v22.2 backport.

OK, I see.

Copy link
Copy Markdown
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 13 files at r1, 20 of 20 files at r4, 20 of 20 files at r5, all commit messages.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)

This commit adds the `optimizer_use_improved_disjunction_stats` session
setting that defaults to `false`. When `true`, the improved disjunction
logic added in cockroachdb#89358 is performed. When `false`, the logic reverts to
the previous logic. This setting will allow use to backport the new
logic without impacting query plans of running clusters by defaulting
the setting to false.

Release note: None
@mgartner mgartner force-pushed the backport22.1-89358-94389 branch from 1b9d287 to ad2e713 Compare December 30, 2022 17:18
@mgartner
Copy link
Copy Markdown
Contributor Author

Should this say "defaults to false" ?

Good catch. I've updated the commit and also made a note in the PR description to highlight that this differs from the original commit.

@mgartner
Copy link
Copy Markdown
Contributor Author

TFTRs!

@mgartner mgartner merged commit 000c962 into cockroachdb:release-22.1 Dec 30, 2022
@mgartner mgartner deleted the backport22.1-89358-94389 branch December 30, 2022 18:44
craig bot pushed a commit that referenced this pull request Jan 3, 2023
92327: backupccl: add tenant backup nemesis tests r=msbutler a=stevendanna

This adds a bit of randomized testing to stress incremental tenant backups.

Tenant backups were previously untenable because of non-MVCC operations. Now that we support MVCC-compatible bulk operations, they should in principal be possible.

Epic: CRDB-20264

Release note: None

94442: opt: fix minor mistakes in legacy disjunction stats logic r=mgartner a=mgartner

These subtle bugs were discovered while creating #94439, the backport
of #89358 and #94389 to v22.1. They have already been included in that
backport. The backport to v22.2 has not yet been created, and it will
include this commit.

Epic: None

Release note: None


Co-authored-by: Steven Danna <danna@cockroachlabs.com>
Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
mgartner added a commit to mgartner/cockroach that referenced this pull request Jan 3, 2023
These subtle bugs were discovered while creating cockroachdb#94439, the backport
of cockroachdb#89358 and cockroachdb#94389 to v22.1. They have already been included in that
backport. The backport to v22.2 has not yet been created, and it will
include this commit.

Release note: None
mgartner added a commit to mgartner/cockroach that referenced this pull request Jan 4, 2023
These subtle bugs were discovered while creating cockroachdb#94439, the backport
of cockroachdb#89358 and cockroachdb#94389 to v22.1. They have already been included in that
backport. The backport to v22.2 has not yet been created, and it will
include this commit.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants