sql/stats: generate statistics forecasts by michae2 · Pull Request #86078 · cockroachdb/cockroach

michae2 · 2022-08-13T05:42:56Z

sql/stats: use nil eval.Context as CompareContext when forecasting

When forecasting table statistics, we don't need a full *eval.Context.
We can simply use a nil *eval.Context as a tree.CompareContext. This
means we don't have to plumb an eval.Context into the stats cache.

Assists: #79872

Release note: None

sql/stats: generate statistics forecasts in the stats cache

As of this commit, we now try to generate statistics forecasts for every
column of every table. This happens whenever statistics are loaded into
or refreshed in the stats cache. We use only the forecasts that fit the
historical collected statistics very well, meaning we have high
confidence in their accuracy.

Fixes: #79872

Release note (performance improvement): Enable table statistics
forecasts, which predict future statistics based on historical collected
statistics. Forecasts help the optimizer produce better plans for
queries that read data modified after the latest statistics collection.
We use only the forecasts that fit the historical collected statistics
very well, meaning we have high confidence in their accuracy. Forecasts
can be viewed using SHOW STATISTICS FOR TABLE ... WITH FORECAST.

sql: show forecasted stats time in EXPLAIN

When using statistics forecasts, add the forecast time (which could be
in the future) to EXPLAIN output. This both indicates that forecasts are
in use, and gives us an idea of how up-to-date / ahead they are.

Assists: #79872

Release note: None

sql/opt: add tests for statistics forecasts

Add a few simple testcases for usage of statistics forecasts by the
optimizer.

Assists: #79872

Release note: None

Release justification: Enable feature before we get too far into
stability period.

cockroach-teamcity · 2022-08-13T05:43:04Z

This change is

michae2 · 2022-08-14T05:13:58Z

Opening this to get some CI testing.

michae2 · 2022-08-15T20:22:53Z

Ok, this is RFAL.

rytaft

Reviewed 5 of 6 files at r1, 12 of 12 files at r5, 2 of 2 files at r6, 7 of 7 files at r7, 2 of 2 files at r8, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @yuzefovich)

pkg/sql/opt/exec/explain/emit.go line 495 at r7 (raw file):

							}
							duration = fmt.Sprintf(
								"; stats collected %s ago", humanizeutil.LongDuration(timeSinceStats),

I feel like it might still be useful to report how long ago stats were collected, even if we're using a forecast -- what do you think? Would it be too much info?

pkg/sql/stats/forecast.go line 259 at r5 (raw file):

		hist, err := predictHistogram(
			ctx, evalCtx, observed, forecastAt, minRequiredFit, nonNullRowCount,

Why don't you also remove it from this function?

pkg/sql/stats/quantile_test.go line 49 at r5 (raw file):

			for i := 0; i < 5; i++ {
				t.Run(fmt.Sprintf("%v/%v", colType.Name(), i), func(t *testing.T) {
					hist, rowCount := randHist(evalCtx, colType, rng)

Can you remove evalCtx from randHist?

pkg/sql/stats/quantile_test.go line 571 at r5 (raw file):

	for i, tc := range testCases {
		t.Run(strconv.Itoa(i), func(t *testing.T) {
			hist, err := tc.qfun.toHistogram(evalCtx, types.Float, tc.rows)

Can you remove evalCtx from toHistogram?

pkg/sql/stats/quantile_test.go line 869 at r5 (raw file):

				return
			}
			cmp, err := res.CompareError(evalCtx, tc.dat)

Can you just pass nil here?

pkg/sql/stats/quantile_test.go line 1136 at r5 (raw file):

				return
			}
			cmp, err := d.CompareError(evalCtx, tc.dat)

ditto

yuzefovich

Reviewed 5 of 6 files at r1, 12 of 12 files at r5, 2 of 2 files at r6, 7 of 7 files at r7, 2 of 2 files at r8, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @mgartner and @michae2)

pkg/sql/opt/exec/execbuilder/testdata/forecast line 168 at r8 (raw file):

]'

query TTTIIII

Can this test be flaky because the stats cache hasn't been loaded?

When forecasting table statistics, we don't need a full *eval.Context. We can simply use a nil *eval.Context as a tree.CompareContext. This means we don't have to plumb an eval.Context into the stats cache. Assists: cockroachdb#79872 Release note: None

As of this commit, we now try to generate statistics forecasts for every column of every table. This happens whenever statistics are loaded into or refreshed in the stats cache. We use only the forecasts that fit the historical collected statistics very well, meaning we have high confidence in their accuracy. Fixes: cockroachdb#79872 Release note (performance improvement): Enable table statistics forecasts, which predict future statistics based on historical collected statistics. Forecasts help the optimizer produce better plans for queries that read data modified after the latest statistics collection. We use only the forecasts that fit the historical collected statistics very well, meaning we have high confidence in their accuracy. Forecasts can be viewed using `SHOW STATISTICS FOR TABLE ... WITH FORECAST`.

When using statistics forecasts, add the forecast time (which could be in the future) to EXPLAIN output. This both indicates that forecasts are in use, and gives us an idea of how up-to-date / ahead they are. Assists: cockroachdb#79872 Release note: None

Add a few simple testcases for usage of statistics forecasts by the optimizer. Assists: cockroachdb#79872 Release note: None

michae2

Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner, @michae2, @rytaft, and @yuzefovich)

pkg/sql/opt/exec/execbuilder/testdata/forecast line 168 at r8 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Can this test be flaky because the stats cache hasn't been loaded?

Querying a table will force the stats cache to load the stats for that table, so I don't think that type of flake is possible, IIU.

pkg/sql/opt/exec/explain/emit.go line 495 at r7 (raw file):