sql,opt: collect small histograms on all non-index columns, update test fixtures#52905
sql,opt: collect small histograms on all non-index columns, update test fixtures#52905craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
|
pkg/sql/opt/testutils/opttester/testfixtures/tpcc_schema, line 78 at r2 (raw file):
These were removed on purpose. They may not have been removed from all fixtures, @rohany should know what was updated and what wasn't. |
The schemas in the optimizer are more up to date right now than the ones in workload. |
rytaft
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rohany)
pkg/sql/opt/testutils/opttester/testfixtures/tpcc_schema, line 78 at r2 (raw file):
Previously, RaduBerinde wrote…
These were removed on purpose. They may not have been removed from all fixtures, @rohany should know what was updated and what wasn't.
Got it, thanks. In that case I'll revert these changes and manually delete the indexes from my running cluster.
pkg/sql/opt/testutils/opttester/testfixtures/tpcc_schema, line 146 at r2 (raw file):
s_data varchar(50), primary key (s_w_id, s_i_id), index stock_item_fk_idx (s_i_id),
should this one be removed too?
|
pkg/sql/opt/testutils/opttester/testfixtures/tpcc_schema, line 146 at r2 (raw file): Previously, rytaft (Rebecca Taft) wrote…
No, I believe that one was shown to be beneficial. |
No -- initial testing showed that we still need that one, but I'm not sure why yet. |
rytaft
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/opt/testutils/opttester/testfixtures/tpcc_schema, line 78 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Got it, thanks. In that case I'll revert these changes and manually delete the indexes from my running cluster.
Done.
pkg/sql/opt/testutils/opttester/testfixtures/tpcc_schema, line 146 at r2 (raw file):
Previously, RaduBerinde wrote…
No, I believe that one was shown to be beneficial.
Got it - thanks.
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
pkg/sql/distsql_plan_stats.go, line 43 at r1 (raw file):
const histogramSamples = 10000 const histogramBuckets = 200
This constant is duplicated now
This commit updates the logic for automatic statistics collection so that 2-bucket histograms are collected on non-index columns. 200-bucket histograms are still collected for all index columns. Fixes cockroachdb#49374 Release note (performance improvement): Maximum and minimum values, represented as 2-bucket histograms, are now collected for all non-index columns (up to 100 columns per table) as part of automatic statistics collection. 200-bucket histograrms are still collected for all index columns. This change enables the optimizer to make better cardinality estimates and may result in better query plans in some cases.
This commit updates the stats in the TPC-C and TPC-H optimizer tests to reflect the stats that are now collected automatically. These stats include full histograms for all index columns, as well as 2-bucket histograms for all other columns. This commit also fixes the TPC-C schema to include some missing foreign key indexes. Fixes cockroachdb#52484 Release note: None
rytaft
left a comment
There was a problem hiding this comment.
TFTR!
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)
pkg/sql/distsql_plan_stats.go, line 43 at r1 (raw file):
Previously, RaduBerinde wrote…
This constant is duplicated now
Done.
|
bors r+ |
|
👎 Rejected by PR status |
|
bors r+ |
|
🕐 Waiting for PR status (Github check) to be set, probably by CI. Bors will automatically try to run when all required PR statuses are set. |
|
Build succeeded: |
sql: collect small histograms on all non-index columns
This commit updates the logic for automatic statistics collection
so that 2-bucket histograms are collected on non-index columns.
200-bucket histograms are still collected for all index columns.
Fixes #49374
Release note (performance improvement): Maximum and minimum
values, represented as 2-bucket histograms, are now collected
for all non-index columns (up to 100 columns per table) as part of
automatic statistics collection. 200-bucket histograrms are still
collected for all index columns. This change enables the optimizer
to make better cardinality estimates and may result in better query
plans in some cases.
opt: update TPC-C and TPC-H test fixtures with new stats
This commit updates the stats in the TPC-C and TPC-H optimizer
tests to reflect the stats that are now collected automatically.
These stats include full histograms for all index columns, as well
as 2-bucket histograms for all other columns.
This commit also fixes the TPC-C schema to include some missing
foreign key indexes.
Fixes #52484
Release note: None