sql: ignore outer buckets when getting partial statistic extreme bounds#126403
Conversation
DrewKimball
left a comment
There was a problem hiding this comment.
Great job! Just have a test suggestion.
Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @Uzair5162)
pkg/sql/logictest/testdata/logic_test/distsql_stats line 2893 at r1 (raw file):
'{"bar": {"baz": 5}}' 0 0 1 # Verify that the correct partial predicate is used for partial stats using
Can we also have a test where the table actually max or min values, and make sure we don't strip off the corresponding buckets?
Full statistic collections sometimes invoke `addOuterBuckets()` which adds buckets with column-type max and min upper bounds to the histogram. Previously, we used the first (non-null) and last bucket as the "less than" and "greater than" bounds for partial statistics collections using extremes. This results in an incorrect predicate when outer buckets exist in the most recent full statistic, which has been fixed in this commit by ignoring outer buckets when determining bounds. Fixes: cockroachdb#93094 See also: cockroachdb#125950 Release note (bug fix): Fixed a bug when creating partial statistics using extremes (which is disabled by default) where it would occasionally use incorrect extreme values and collect no stats. This occurs when outer buckets were added to the previous histogram to account for extra distinct count.
5556495 to
bd63938
Compare
Uzair5162
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)
pkg/sql/logictest/testdata/logic_test/distsql_stats line 2893 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Can we also have a test where the table actually max or min values, and make sure we don't strip off the corresponding buckets?
Done.
DrewKimball
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @Uzair5162)
|
bors r+ |
Full statistic collections sometimes invoke
addOuterBuckets()which adds buckets with column-type max and min upper bounds to the histogram. Previously, we used the first (non-null) and last bucket as the "less than" and "greater than" bounds for partial statistics collections using extremes. This results in an incorrect predicate when outer buckets exist in the most recent full statistic, which has been fixed in this commit by ignoring outer buckets when determining bounds.Note that this fix doesn't apply to partial stats created on enum or bool type columns (see #126401) as outer buckets on those histograms don't have
NumEq == 0. Reducing the check to justUpperBound.IsMax() || UpperBound.IsMin()for these types is problematic as for bools, every bucket is a max or min and enums are typically small and behave similarly.Fixes: #93094
See also: #125950
Release note (bug fix): Fixed a bug when creating partial statistics using extremes (which is disabled by default) where it would occasionally use incorrect extreme values and collect no stats. This occurs when outer buckets were added to the previous histogram to account for extra distinct count.