Skip to content

sql/stats: generate statistics forecasts#86078

Merged
craig[bot] merged 4 commits intocockroachdb:masterfrom
michae2:stats_cache
Aug 16, 2022
Merged

sql/stats: generate statistics forecasts#86078
craig[bot] merged 4 commits intocockroachdb:masterfrom
michae2:stats_cache

Conversation

@michae2
Copy link
Copy Markdown
Collaborator

@michae2 michae2 commented Aug 13, 2022

sql/stats: use nil eval.Context as CompareContext when forecasting

When forecasting table statistics, we don't need a full *eval.Context.
We can simply use a nil *eval.Context as a tree.CompareContext. This
means we don't have to plumb an eval.Context into the stats cache.

Assists: #79872

Release note: None

sql/stats: generate statistics forecasts in the stats cache

As of this commit, we now try to generate statistics forecasts for every
column of every table. This happens whenever statistics are loaded into
or refreshed in the stats cache. We use only the forecasts that fit the
historical collected statistics very well, meaning we have high
confidence in their accuracy.

Fixes: #79872

Release note (performance improvement): Enable table statistics
forecasts, which predict future statistics based on historical collected
statistics. Forecasts help the optimizer produce better plans for
queries that read data modified after the latest statistics collection.
We use only the forecasts that fit the historical collected statistics
very well, meaning we have high confidence in their accuracy. Forecasts
can be viewed using SHOW STATISTICS FOR TABLE ... WITH FORECAST.

sql: show forecasted stats time in EXPLAIN

When using statistics forecasts, add the forecast time (which could be
in the future) to EXPLAIN output. This both indicates that forecasts are
in use, and gives us an idea of how up-to-date / ahead they are.

Assists: #79872

Release note: None

sql/opt: add tests for statistics forecasts

Add a few simple testcases for usage of statistics forecasts by the
optimizer.

Assists: #79872

Release note: None


Release justification: Enable feature before we get too far into
stability period.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@michae2 michae2 force-pushed the stats_cache branch 2 times, most recently from a6bc1a5 to c906f6a Compare August 14, 2022 05:07
@michae2 michae2 marked this pull request as ready for review August 14, 2022 05:12
@michae2 michae2 requested a review from a team as a code owner August 14, 2022 05:12
@michae2
Copy link
Copy Markdown
Collaborator Author

michae2 commented Aug 14, 2022

Opening this to get some CI testing.

@michae2
Copy link
Copy Markdown
Collaborator Author

michae2 commented Aug 15, 2022

Ok, this is RFAL.

Copy link
Copy Markdown
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 6 files at r1, 12 of 12 files at r5, 2 of 2 files at r6, 7 of 7 files at r7, 2 of 2 files at r8, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @yuzefovich)


pkg/sql/opt/exec/explain/emit.go line 495 at r7 (raw file):

							}
							duration = fmt.Sprintf(
								"; stats collected %s ago", humanizeutil.LongDuration(timeSinceStats),

I feel like it might still be useful to report how long ago stats were collected, even if we're using a forecast -- what do you think? Would it be too much info?


pkg/sql/stats/forecast.go line 259 at r5 (raw file):

		hist, err := predictHistogram(
			ctx, evalCtx, observed, forecastAt, minRequiredFit, nonNullRowCount,

Why don't you also remove it from this function?


pkg/sql/stats/quantile_test.go line 49 at r5 (raw file):

			for i := 0; i < 5; i++ {
				t.Run(fmt.Sprintf("%v/%v", colType.Name(), i), func(t *testing.T) {
					hist, rowCount := randHist(evalCtx, colType, rng)

Can you remove evalCtx from randHist?


pkg/sql/stats/quantile_test.go line 571 at r5 (raw file):

	for i, tc := range testCases {
		t.Run(strconv.Itoa(i), func(t *testing.T) {
			hist, err := tc.qfun.toHistogram(evalCtx, types.Float, tc.rows)

Can you remove evalCtx from toHistogram?


pkg/sql/stats/quantile_test.go line 869 at r5 (raw file):

				return
			}
			cmp, err := res.CompareError(evalCtx, tc.dat)

Can you just pass nil here?


pkg/sql/stats/quantile_test.go line 1136 at r5 (raw file):

				return
			}
			cmp, err := d.CompareError(evalCtx, tc.dat)

ditto

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 6 files at r1, 12 of 12 files at r5, 2 of 2 files at r6, 7 of 7 files at r7, 2 of 2 files at r8, all commit messages.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @mgartner and @michae2)


pkg/sql/opt/exec/execbuilder/testdata/forecast line 168 at r8 (raw file):

]'

query TTTIIII

Can this test be flaky because the stats cache hasn't been loaded?

When forecasting table statistics, we don't need a full *eval.Context.
We can simply use a nil *eval.Context as a tree.CompareContext. This
means we don't have to plumb an eval.Context into the stats cache.

Assists: cockroachdb#79872

Release note: None
As of this commit, we now try to generate statistics forecasts for every
column of every table. This happens whenever statistics are loaded into
or refreshed in the stats cache. We use only the forecasts that fit the
historical collected statistics very well, meaning we have high
confidence in their accuracy.

Fixes: cockroachdb#79872

Release note (performance improvement): Enable table statistics
forecasts, which predict future statistics based on historical collected
statistics. Forecasts help the optimizer produce better plans for
queries that read data modified after the latest statistics collection.
We use only the forecasts that fit the historical collected statistics
very well, meaning we have high confidence in their accuracy. Forecasts
can be viewed using `SHOW STATISTICS FOR TABLE ... WITH FORECAST`.
When using statistics forecasts, add the forecast time (which could be
in the future) to EXPLAIN output. This both indicates that forecasts are
in use, and gives us an idea of how up-to-date / ahead they are.

Assists: cockroachdb#79872

Release note: None
Add a few simple testcases for usage of statistics forecasts by the
optimizer.

Assists: cockroachdb#79872

Release note: None
Copy link
Copy Markdown
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner, @michae2, @rytaft, and @yuzefovich)


pkg/sql/opt/exec/execbuilder/testdata/forecast line 168 at r8 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Can this test be flaky because the stats cache hasn't been loaded?

Querying a table will force the stats cache to load the stats for that table, so I don't think that type of flake is possible, IIU.


pkg/sql/opt/exec/explain/emit.go line 495 at r7 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I feel like it might still be useful to report how long ago stats were collected, even if we're using a forecast -- what do you think? Would it be too much info?

I like it! Done.


pkg/sql/stats/forecast.go line 259 at r5 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Why don't you also remove it from this function?

Done.


pkg/sql/stats/quantile_test.go line 49 at r5 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Can you remove evalCtx from randHist?

Done.


pkg/sql/stats/quantile_test.go line 571 at r5 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Can you remove evalCtx from toHistogram?

Done.


pkg/sql/stats/quantile_test.go line 869 at r5 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Can you just pass nil here?

It needs to be a typed nil, so I don't think I can.

@michae2
Copy link
Copy Markdown
Collaborator Author

michae2 commented Aug 16, 2022

TFTRs!

bors r=rytaft,yuzefovich

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 16, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

opt: predict future statistics based on historical stats

4 participants