Skip to content

sql: ignore outer buckets when getting partial statistic extreme bounds#126403

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
Uzair5162:fix-partial-stats-at-extremes-with-outer-buckets-gh-93094
Jun 28, 2024
Merged

sql: ignore outer buckets when getting partial statistic extreme bounds#126403
craig[bot] merged 1 commit intocockroachdb:masterfrom
Uzair5162:fix-partial-stats-at-extremes-with-outer-buckets-gh-93094

Conversation

@Uzair5162
Copy link
Copy Markdown
Contributor

Full statistic collections sometimes invoke addOuterBuckets() which adds buckets with column-type max and min upper bounds to the histogram. Previously, we used the first (non-null) and last bucket as the "less than" and "greater than" bounds for partial statistics collections using extremes. This results in an incorrect predicate when outer buckets exist in the most recent full statistic, which has been fixed in this commit by ignoring outer buckets when determining bounds.

Note that this fix doesn't apply to partial stats created on enum or bool type columns (see #126401) as outer buckets on those histograms don't have NumEq == 0. Reducing the check to just UpperBound.IsMax() || UpperBound.IsMin() for these types is problematic as for bools, every bucket is a max or min and enums are typically small and behave similarly.

Fixes: #93094

See also: #125950

Release note (bug fix): Fixed a bug when creating partial statistics using extremes (which is disabled by default) where it would occasionally use incorrect extreme values and collect no stats. This occurs when outer buckets were added to the previous histogram to account for extra distinct count.

@Uzair5162 Uzair5162 requested a review from DrewKimball June 28, 2024 14:47
@Uzair5162 Uzair5162 requested a review from a team as a code owner June 28, 2024 14:47
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Great job! Just have a test suggestion.

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @Uzair5162)


pkg/sql/logictest/testdata/logic_test/distsql_stats line 2893 at r1 (raw file):

'{"bar": {"baz": 5}}'  0           0                    1

# Verify that the correct partial predicate is used for partial stats using

Can we also have a test where the table actually max or min values, and make sure we don't strip off the corresponding buckets?

Full statistic collections sometimes invoke `addOuterBuckets()` which
adds buckets with column-type max and min upper bounds to the histogram.
Previously, we used the first (non-null) and last bucket as the "less
than" and "greater than" bounds for partial statistics collections using
extremes. This results in an incorrect predicate when outer buckets
exist in the most recent full statistic, which has been fixed in this
commit by ignoring outer buckets when determining bounds.

Fixes: cockroachdb#93094

See also: cockroachdb#125950

Release note (bug fix): Fixed a bug when creating partial statistics
using extremes (which is disabled by default) where it would
occasionally use incorrect extreme values and collect no stats. This
occurs when outer buckets were added to the previous histogram to
account for extra distinct count.
@Uzair5162 Uzair5162 force-pushed the fix-partial-stats-at-extremes-with-outer-buckets-gh-93094 branch from 5556495 to bd63938 Compare June 28, 2024 20:06
Copy link
Copy Markdown
Contributor Author

@Uzair5162 Uzair5162 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)


pkg/sql/logictest/testdata/logic_test/distsql_stats line 2893 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Can we also have a test where the table actually max or min values, and make sure we don't strip off the corresponding buckets?

Done.

@Uzair5162 Uzair5162 requested a review from DrewKimball June 28, 2024 20:08
Copy link
Copy Markdown
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @Uzair5162)

@Uzair5162
Copy link
Copy Markdown
Contributor Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jun 28, 2024

@craig craig bot merged commit 047a7ed into cockroachdb:master Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql/stats: addOuterBuckets logic breaks CREATE STATISTICS USING EXTREMES

3 participants