Skip to content

sql: collect histogram stats for all index columns#52448

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
mgartner:collect-partial-index-predicate-stats
Aug 7, 2020
Merged

sql: collect histogram stats for all index columns#52448
craig[bot] merged 1 commit intocockroachdb:masterfrom
mgartner:collect-partial-index-predicate-stats

Conversation

@mgartner
Copy link
Copy Markdown
Contributor

@mgartner mgartner commented Aug 6, 2020

This commit enables histogram stats collection for all indexed columns
and any columns referenced in a partial index predicate expression.
It is probable that these columns will be used in query filters, so
collecting stats on them should improve the optimizer's ability to
estimate the costs of query plans.

Fixes #50829

Release note: None

@mgartner mgartner requested a review from rytaft August 6, 2020 00:54
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Comment thread pkg/sql/create_stats.go Outdated
@mgartner mgartner force-pushed the collect-partial-index-predicate-stats branch from fce4c7e to 149abb8 Compare August 6, 2020 01:05
Copy link
Copy Markdown
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] add a release note for performance improvement

:lgtm: Thanks for making this change!

Reviewed 1 of 2 files at r1, 1 of 1 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)


pkg/sql/create_stats.go, line 370 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

This is a bit confusing, and I don't love it. It's a bit too clever. It's not super clear that we get to this point when j == 0 and the index is an inverted index. I couldn't figure out anything simpler besides two entirely separate paths for forward vs. inverted with quite a bit of duplication. I'm to ideas here.

I think you could move this logic inside addSingleColumnHistogramIfNotExists, with an extra boolean parameter for isInverted. If isInverted is true, you'd just add two stats instead of one.

Not sure if that's any cleaner, but it would reduce a bit of code duplication....


pkg/sql/create_stats.go, line 293 at r2 (raw file):

// In addition to the index columns, we collect stats on up to maxNonIndexCols
// other columns from the table. We only collect histograms for index columns,
// plus any other boolean columns (where the "histogram" is tiny).

not your change, but while you're here, can you change this to "boolean or enum"?


pkg/sql/create_stats.go, line 304 at r2 (raw file):

	// requestedStats set. If the columnIDs were not already in the set, it
	// returns true.
	trackStatsIfNotExists := func(colIDs []descpb.ColumnID) bool {

Nice encapsulation here


pkg/sql/create_stats.go, line 314 at r2 (raw file):

	// addSingleColumnHistogramIfNotExists appends histogram column stats for
	// the given column ID if the colStats if it does not already exist.

needs to be reworded: "if the colStats if it does"

@rytaft
Copy link
Copy Markdown
Collaborator

rytaft commented Aug 6, 2020

Doesn't have to be part of this PR, but one thing we shouldn't forget to do at some point is update the stats we have collected in our opt testfixtures for TPC-C, TPC-H, etc to include histograms on all index columns. If you don't want to bother with it now, maybe open an issue.

Copy link
Copy Markdown
Contributor Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed an issue here: #52484

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @rytaft)


pkg/sql/create_stats.go, line 370 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think you could move this logic inside addSingleColumnHistogramIfNotExists, with an extra boolean parameter for isInverted. If isInverted is true, you'd just add two stats instead of one.

Not sure if that's any cleaner, but it would reduce a bit of code duplication....

Great idea. I think it's much simpler now. I seemed to forget that inverted indexes can't be multi-column, which over-complicated things.


pkg/sql/create_stats.go, line 293 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

not your change, but while you're here, can you change this to "boolean or enum"?

Done.


pkg/sql/create_stats.go, line 304 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Nice encapsulation here

😊


pkg/sql/create_stats.go, line 314 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

needs to be reworded: "if the colStats if it does"

Done.

Copy link
Copy Markdown
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)


pkg/sql/create_stats.go, line 315 at r3 (raw file):

	// addSingleColumnStatIfNotExists appends column stats for the given column
	// ID if they have not already been added.
	addSingleColumnStatIfNotExists := func(colID descpb.ColumnID, isInverted bool) {

Maybe change the name of this function to addIndexColumnStatIfNotExists and specify in the comment that we collect histograms on all index columns?


pkg/sql/create_stats.go, line 323 at r3 (raw file):

		}

		// Only generate a histogram for forward indexes.

[nit] this comment is a bit confusing since we do generate a histogram for inverted indexes in the next block.


pkg/sql/create_stats.go, line 382 at r3 (raw file):

			// Only collect multi-column stats if enabled.
			if !multiColEnabled {
				break

don't you want this to be continue so addSingleColumnStatIfNotExists will be called? Maybe add a test with an index with 3 columns to test this case....

This commit enables histogram stats collection for all indexed columns
and any columns referenced in a partial index predicate expression.
It is probable that these columns will be used in query filters, so
collecting stats on them should improve the optimizer's ability to
estimate the costs of query plans.

Release note (performance improvement): Previously, histogram statistics
were only collected for the first column of each index. Now they are
collected for each indexed column, allowing the optimizer to more
accurately estimate costs of query plans.
@mgartner mgartner force-pushed the collect-partial-index-predicate-stats branch from 4dd9b67 to f4e865a Compare August 6, 2020 20:29
Copy link
Copy Markdown
Contributor Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)


pkg/sql/create_stats.go, line 315 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Maybe change the name of this function to addIndexColumnStatIfNotExists and specify in the comment that we collect histograms on all index columns?

Done.


pkg/sql/create_stats.go, line 323 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] this comment is a bit confusing since we do generate a histogram for inverted indexes in the next block.

Removed.


pkg/sql/create_stats.go, line 382 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

don't you want this to be continue so addSingleColumnStatIfNotExists will be called? Maybe add a test with an index with 3 columns to test this case....

Good catch. It also only shows up as a bug when multi column stats are disable, so I've done that for these new tests.

Copy link
Copy Markdown
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r4.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)

@mgartner
Copy link
Copy Markdown
Contributor Author

mgartner commented Aug 7, 2020

bors r=rytaft

@craig
Copy link
Copy Markdown
Contributor

craig Bot commented Aug 7, 2020

Build succeeded:

@craig craig Bot merged commit 9ab5dbc into cockroachdb:master Aug 7, 2020
@mgartner mgartner deleted the collect-partial-index-predicate-stats branch August 7, 2020 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

opt: trigger histogram collection for columns in partial index predicate

3 participants