release-22.1: memo: better selectivity estimates on single-table ORed predicates#94439
Conversation
|
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
|
Do not review yet. I need to figure out why some of the join stats have changed. |
165059d to
1c32595
Compare
|
Ok, this is ready for review. |
These subtle bugs were discovered while creating cockroachdb#94439, the backport of cockroachdb#89358 and cockroachdb#94389 to v22.1. They have already been included in that backport. The backport to v22.2 has not yet been created, and it will include this commit. Release note: None
Fixes cockroachdb#75090 Currently, the optimizer only does a good job of estimating the selectivity of single-table ORed predicates when all disjuncts are tight constraints. Otherwise we give up and go with the default selectivity of 1/3 for the entire filter. If any of the disjuncts has a selectivity above 1/3, then the total selectivity should be at least as high as the selectivity of that disjunct. Any failure to find a tight constraint for a given disjunct should only cause a selectivity of 1/3 to be used for that disjunct, not the entire filter. This is addressed by using the method of combining ORed selectivities introduced by cockroachdb#74303: ``` Given predicates A and B and probability function P: P(A or B) = P(A) + P(B) - P(A and B) ``` The above formula is applied n-1 times, given n disjuncts. For each disjunct which is not a tight constraint, 1/3 is used for that predicate's selectivity in the formula. Release note (bug fix): This patch fixes incorrect selectivity estimation for queries with ORed predicates all referencing a common single table.
1c32595 to
1b9d287
Compare
msirek
left a comment
There was a problem hiding this comment.
Reviewed 3 of 20 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
pkg/sql/opt/memo/statistics_builder.go line 3175 at r4 (raw file):
// not an Or, bump the number of unapplied disjuncts. numUnappliedDisjuncts++ return
It seems like we want the new flag to control the setting of unconstrained and incrementing of numUnappliedDisjuncts instead of just removing the old logic.
pkg/sql/opt/memo/statistics_builder.go line 3194 at r4 (raw file):
collectConstraints(or.Right) return constraints, numUnappliedDisjuncts
Same here. It seems like we want to follow the old behavior when the flag is false. Otherwise we're changing the result, even when the flag is false.
Code quote:
not an Or, bump the number of unapplied disjuncts.|
Previously, msirek (Mark Sirek) wrote…
Notice the check I added to gate the old logic above: |
msirek
left a comment
There was a problem hiding this comment.
Reviewed 1 of 13 files at r1, 17 of 20 files at r4, 20 of 20 files at r5, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
-- commits line 34 at r5:
Should this say "defaults to false" ?
pkg/sql/opt/memo/statistics_builder.go line 3175 at r4 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Notice the check I added to gate the old logic above:
if numUnappliedDisjuncts == 0 {. This matches the old logic which needing to contort the function signature any more. I originally missed this, which is why I created #94442 so these fixes make it back into master and the v22.2 backport.
OK, I see.
DrewKimball
left a comment
There was a problem hiding this comment.
Reviewed 1 of 13 files at r1, 20 of 20 files at r4, 20 of 20 files at r5, all commit messages.
Reviewable status:complete! 2 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
This commit adds the `optimizer_use_improved_disjunction_stats` session setting that defaults to `false`. When `true`, the improved disjunction logic added in cockroachdb#89358 is performed. When `false`, the logic reverts to the previous logic. This setting will allow use to backport the new logic without impacting query plans of running clusters by defaulting the setting to false. Release note: None
1b9d287 to
ad2e713
Compare
Good catch. I've updated the commit and also made a note in the PR description to highlight that this differs from the original commit. |
|
TFTRs! |
92327: backupccl: add tenant backup nemesis tests r=msbutler a=stevendanna This adds a bit of randomized testing to stress incremental tenant backups. Tenant backups were previously untenable because of non-MVCC operations. Now that we support MVCC-compatible bulk operations, they should in principal be possible. Epic: CRDB-20264 Release note: None 94442: opt: fix minor mistakes in legacy disjunction stats logic r=mgartner a=mgartner These subtle bugs were discovered while creating #94439, the backport of #89358 and #94389 to v22.1. They have already been included in that backport. The backport to v22.2 has not yet been created, and it will include this commit. Epic: None Release note: None Co-authored-by: Steven Danna <danna@cockroachlabs.com> Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
These subtle bugs were discovered while creating cockroachdb#94439, the backport of cockroachdb#89358 and cockroachdb#94389 to v22.1. They have already been included in that backport. The backport to v22.2 has not yet been created, and it will include this commit. Release note: None
These subtle bugs were discovered while creating cockroachdb#94439, the backport of cockroachdb#89358 and cockroachdb#94389 to v22.1. They have already been included in that backport. The backport to v22.2 has not yet been created, and it will include this commit. Release note: None
Backport:
Please see individual PRs for details.
Note that the setting
optimizer_use_improved_disjunction_statsdefaults tofalseinthis backport (differing from the original commit) so that users must opt into the new
behavior.
Release justification: This backports improved statistics logic behind a session setting.
/cc @cockroachdb/release