Skip to content

sql,opt: collect small histograms on all non-index columns, update test fixtures#52905

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
rytaft:two-buckets
Aug 17, 2020
Merged

sql,opt: collect small histograms on all non-index columns, update test fixtures#52905
craig[bot] merged 2 commits intocockroachdb:masterfrom
rytaft:two-buckets

Conversation

@rytaft
Copy link
Copy Markdown
Collaborator

@rytaft rytaft commented Aug 17, 2020

sql: collect small histograms on all non-index columns

This commit updates the logic for automatic statistics collection
so that 2-bucket histograms are collected on non-index columns.
200-bucket histograms are still collected for all index columns.

Fixes #49374

Release note (performance improvement): Maximum and minimum
values, represented as 2-bucket histograms, are now collected
for all non-index columns (up to 100 columns per table) as part of
automatic statistics collection. 200-bucket histograrms are still
collected for all index columns. This change enables the optimizer
to make better cardinality estimates and may result in better query
plans in some cases.

opt: update TPC-C and TPC-H test fixtures with new stats

This commit updates the stats in the TPC-C and TPC-H optimizer
tests to reflect the stats that are now collected automatically.
These stats include full histograms for all index columns, as well
as 2-bucket histograms for all other columns.

This commit also fixes the TPC-C schema to include some missing
foreign key indexes.

Fixes #52484

Release note: None

@rytaft rytaft requested a review from mgartner August 17, 2020 15:36
@rytaft rytaft requested a review from a team as a code owner August 17, 2020 15:36
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@RaduBerinde
Copy link
Copy Markdown
Member


pkg/sql/opt/testutils/opttester/testfixtures/tpcc_schema, line 78 at r2 (raw file):

    h_data   varchar(24),
    primary key (h_w_id, rowid),
    index history_customer_fk_idx (h_c_w_id, h_c_d_id, h_c_id),

These were removed on purpose. They may not have been removed from all fixtures, @rohany should know what was updated and what wasn't.

@rohany
Copy link
Copy Markdown
Contributor

rohany commented Aug 17, 2020

These were removed on purpose. They may not have been removed from all fixtures, @rohany should know what was updated and what wasn't.

The schemas in the optimizer are more up to date right now than the ones in workload.

Copy link
Copy Markdown
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rohany)


pkg/sql/opt/testutils/opttester/testfixtures/tpcc_schema, line 78 at r2 (raw file):

Previously, RaduBerinde wrote…

These were removed on purpose. They may not have been removed from all fixtures, @rohany should know what was updated and what wasn't.

Got it, thanks. In that case I'll revert these changes and manually delete the indexes from my running cluster.


pkg/sql/opt/testutils/opttester/testfixtures/tpcc_schema, line 146 at r2 (raw file):

    s_data       varchar(50),
    primary key (s_w_id, s_i_id),
    index stock_item_fk_idx (s_i_id),

should this one be removed too?

@RaduBerinde
Copy link
Copy Markdown
Member


pkg/sql/opt/testutils/opttester/testfixtures/tpcc_schema, line 146 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

should this one be removed too?

No, I believe that one was shown to be beneficial.

@rohany
Copy link
Copy Markdown
Contributor

rohany commented Aug 17, 2020

should this one be removed too?

No -- initial testing showed that we still need that one, but I'm not sure why yet.

Copy link
Copy Markdown
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner)


pkg/sql/opt/testutils/opttester/testfixtures/tpcc_schema, line 78 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Got it, thanks. In that case I'll revert these changes and manually delete the indexes from my running cluster.

Done.


pkg/sql/opt/testutils/opttester/testfixtures/tpcc_schema, line 146 at r2 (raw file):

Previously, RaduBerinde wrote…

No, I believe that one was shown to be beneficial.

Got it - thanks.

Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/sql/distsql_plan_stats.go, line 43 at r1 (raw file):

const histogramSamples = 10000
const histogramBuckets = 200

This constant is duplicated now

rytaft added 2 commits August 17, 2020 14:10
This commit updates the logic for automatic statistics collection
so that 2-bucket histograms are collected on non-index columns.
200-bucket histograms are still collected for all index columns.

Fixes cockroachdb#49374

Release note (performance improvement): Maximum and minimum
values, represented as 2-bucket histograms, are now collected
for all non-index columns (up to 100 columns per table) as part of
automatic statistics collection. 200-bucket histograrms are still
collected for all index columns. This change enables the optimizer
to make better cardinality estimates and may result in better query
plans in some cases.
This commit updates the stats in the TPC-C and TPC-H optimizer
tests to reflect the stats that are now collected automatically.
These stats include full histograms for all index columns, as well
as 2-bucket histograms for all other columns.

This commit also fixes the TPC-C schema to include some missing
foreign key indexes.

Fixes cockroachdb#52484

Release note: None
Copy link
Copy Markdown
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)


pkg/sql/distsql_plan_stats.go, line 43 at r1 (raw file):

Previously, RaduBerinde wrote…

This constant is duplicated now

Done.

@rytaft
Copy link
Copy Markdown
Collaborator Author

rytaft commented Aug 17, 2020

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 17, 2020

👎 Rejected by PR status

@RaduBerinde RaduBerinde removed the request for review from mgartner August 17, 2020 21:17
@RaduBerinde
Copy link
Copy Markdown
Member

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 17, 2020

🕐 Waiting for PR status (Github check) to be set, probably by CI. Bors will automatically try to run when all required PR statuses are set.

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 17, 2020

Build succeeded:

@craig craig bot merged commit c5bf0f2 into cockroachdb:master Aug 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

opt: update opt stats test fixtures for TPC-C, TPC-H to include histograms on index columns sql, stats: collect small histograms on all columns

4 participants