sql: collect histogram stats for all index columns#52448
sql: collect histogram stats for all index columns#52448craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
fce4c7e to
149abb8
Compare
rytaft
left a comment
There was a problem hiding this comment.
[nit] add a release note for performance improvement
Thanks for making this change!
Reviewed 1 of 2 files at r1, 1 of 1 files at r2.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/create_stats.go, line 370 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
This is a bit confusing, and I don't love it. It's a bit too clever. It's not super clear that we get to this point when
j == 0and the index is an inverted index. I couldn't figure out anything simpler besides two entirely separate paths for forward vs. inverted with quite a bit of duplication. I'm to ideas here.
I think you could move this logic inside addSingleColumnHistogramIfNotExists, with an extra boolean parameter for isInverted. If isInverted is true, you'd just add two stats instead of one.
Not sure if that's any cleaner, but it would reduce a bit of code duplication....
pkg/sql/create_stats.go, line 293 at r2 (raw file):
// In addition to the index columns, we collect stats on up to maxNonIndexCols // other columns from the table. We only collect histograms for index columns, // plus any other boolean columns (where the "histogram" is tiny).
not your change, but while you're here, can you change this to "boolean or enum"?
pkg/sql/create_stats.go, line 304 at r2 (raw file):
// requestedStats set. If the columnIDs were not already in the set, it // returns true. trackStatsIfNotExists := func(colIDs []descpb.ColumnID) bool {
Nice encapsulation here
pkg/sql/create_stats.go, line 314 at r2 (raw file):
// addSingleColumnHistogramIfNotExists appends histogram column stats for // the given column ID if the colStats if it does not already exist.
needs to be reworded: "if the colStats if it does"
|
Doesn't have to be part of this PR, but one thing we shouldn't forget to do at some point is update the stats we have collected in our opt testfixtures for TPC-C, TPC-H, etc to include histograms on all index columns. If you don't want to bother with it now, maybe open an issue. |
149abb8 to
4dd9b67
Compare
mgartner
left a comment
There was a problem hiding this comment.
Filed an issue here: #52484
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @rytaft)
pkg/sql/create_stats.go, line 370 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I think you could move this logic inside
addSingleColumnHistogramIfNotExists, with an extra boolean parameter forisInverted. IfisInvertedis true, you'd just add two stats instead of one.Not sure if that's any cleaner, but it would reduce a bit of code duplication....
Great idea. I think it's much simpler now. I seemed to forget that inverted indexes can't be multi-column, which over-complicated things.
pkg/sql/create_stats.go, line 293 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
not your change, but while you're here, can you change this to "boolean or enum"?
Done.
pkg/sql/create_stats.go, line 304 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Nice encapsulation here
😊
pkg/sql/create_stats.go, line 314 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
needs to be reworded: "if the colStats if it does"
Done.
rytaft
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)
pkg/sql/create_stats.go, line 315 at r3 (raw file):
// addSingleColumnStatIfNotExists appends column stats for the given column // ID if they have not already been added. addSingleColumnStatIfNotExists := func(colID descpb.ColumnID, isInverted bool) {
Maybe change the name of this function to addIndexColumnStatIfNotExists and specify in the comment that we collect histograms on all index columns?
pkg/sql/create_stats.go, line 323 at r3 (raw file):
} // Only generate a histogram for forward indexes.
[nit] this comment is a bit confusing since we do generate a histogram for inverted indexes in the next block.
pkg/sql/create_stats.go, line 382 at r3 (raw file):
// Only collect multi-column stats if enabled. if !multiColEnabled { break
don't you want this to be continue so addSingleColumnStatIfNotExists will be called? Maybe add a test with an index with 3 columns to test this case....
This commit enables histogram stats collection for all indexed columns and any columns referenced in a partial index predicate expression. It is probable that these columns will be used in query filters, so collecting stats on them should improve the optimizer's ability to estimate the costs of query plans. Release note (performance improvement): Previously, histogram statistics were only collected for the first column of each index. Now they are collected for each indexed column, allowing the optimizer to more accurately estimate costs of query plans.
4dd9b67 to
f4e865a
Compare
mgartner
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)
pkg/sql/create_stats.go, line 315 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Maybe change the name of this function to
addIndexColumnStatIfNotExistsand specify in the comment that we collect histograms on all index columns?
Done.
pkg/sql/create_stats.go, line 323 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] this comment is a bit confusing since we do generate a histogram for inverted indexes in the next block.
Removed.
pkg/sql/create_stats.go, line 382 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
don't you want this to be continue so
addSingleColumnStatIfNotExistswill be called? Maybe add a test with an index with 3 columns to test this case....
Good catch. It also only shows up as a bug when multi column stats are disable, so I've done that for these new tests.
rytaft
left a comment
There was a problem hiding this comment.
Reviewed 2 of 2 files at r4.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @mgartner)
|
bors r=rytaft |
|
Build succeeded: |
This commit enables histogram stats collection for all indexed columns
and any columns referenced in a partial index predicate expression.
It is probable that these columns will be used in query filters, so
collecting stats on them should improve the optimizer's ability to
estimate the costs of query plans.
Fixes #50829
Release note: None