sql/stats: generate statistics forecasts#86078
sql/stats: generate statistics forecasts#86078craig[bot] merged 4 commits intocockroachdb:masterfrom
Conversation
a6bc1a5 to
c906f6a
Compare
|
Opening this to get some CI testing. |
5be1a26 to
d4b9599
Compare
|
Ok, this is RFAL. |
rytaft
left a comment
There was a problem hiding this comment.
Reviewed 5 of 6 files at r1, 12 of 12 files at r5, 2 of 2 files at r6, 7 of 7 files at r7, 2 of 2 files at r8, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @yuzefovich)
pkg/sql/opt/exec/explain/emit.go line 495 at r7 (raw file):
} duration = fmt.Sprintf( "; stats collected %s ago", humanizeutil.LongDuration(timeSinceStats),
I feel like it might still be useful to report how long ago stats were collected, even if we're using a forecast -- what do you think? Would it be too much info?
pkg/sql/stats/forecast.go line 259 at r5 (raw file):
hist, err := predictHistogram( ctx, evalCtx, observed, forecastAt, minRequiredFit, nonNullRowCount,
Why don't you also remove it from this function?
pkg/sql/stats/quantile_test.go line 49 at r5 (raw file):
for i := 0; i < 5; i++ { t.Run(fmt.Sprintf("%v/%v", colType.Name(), i), func(t *testing.T) { hist, rowCount := randHist(evalCtx, colType, rng)
Can you remove evalCtx from randHist?
pkg/sql/stats/quantile_test.go line 571 at r5 (raw file):
for i, tc := range testCases { t.Run(strconv.Itoa(i), func(t *testing.T) { hist, err := tc.qfun.toHistogram(evalCtx, types.Float, tc.rows)
Can you remove evalCtx from toHistogram?
pkg/sql/stats/quantile_test.go line 869 at r5 (raw file):
return } cmp, err := res.CompareError(evalCtx, tc.dat)
Can you just pass nil here?
pkg/sql/stats/quantile_test.go line 1136 at r5 (raw file):
return } cmp, err := d.CompareError(evalCtx, tc.dat)
ditto
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewed 5 of 6 files at r1, 12 of 12 files at r5, 2 of 2 files at r6, 7 of 7 files at r7, 2 of 2 files at r8, all commit messages.
Reviewable status:complete! 2 of 0 LGTMs obtained (waiting on @mgartner and @michae2)
pkg/sql/opt/exec/execbuilder/testdata/forecast line 168 at r8 (raw file):
]' query TTTIIII
Can this test be flaky because the stats cache hasn't been loaded?
When forecasting table statistics, we don't need a full *eval.Context. We can simply use a nil *eval.Context as a tree.CompareContext. This means we don't have to plumb an eval.Context into the stats cache. Assists: cockroachdb#79872 Release note: None
As of this commit, we now try to generate statistics forecasts for every column of every table. This happens whenever statistics are loaded into or refreshed in the stats cache. We use only the forecasts that fit the historical collected statistics very well, meaning we have high confidence in their accuracy. Fixes: cockroachdb#79872 Release note (performance improvement): Enable table statistics forecasts, which predict future statistics based on historical collected statistics. Forecasts help the optimizer produce better plans for queries that read data modified after the latest statistics collection. We use only the forecasts that fit the historical collected statistics very well, meaning we have high confidence in their accuracy. Forecasts can be viewed using `SHOW STATISTICS FOR TABLE ... WITH FORECAST`.
When using statistics forecasts, add the forecast time (which could be in the future) to EXPLAIN output. This both indicates that forecasts are in use, and gives us an idea of how up-to-date / ahead they are. Assists: cockroachdb#79872 Release note: None
Add a few simple testcases for usage of statistics forecasts by the optimizer. Assists: cockroachdb#79872 Release note: None
michae2
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner, @michae2, @rytaft, and @yuzefovich)
pkg/sql/opt/exec/execbuilder/testdata/forecast line 168 at r8 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Can this test be flaky because the stats cache hasn't been loaded?
Querying a table will force the stats cache to load the stats for that table, so I don't think that type of flake is possible, IIU.
pkg/sql/opt/exec/explain/emit.go line 495 at r7 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I feel like it might still be useful to report how long ago stats were collected, even if we're using a forecast -- what do you think? Would it be too much info?
I like it! Done.
pkg/sql/stats/forecast.go line 259 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Why don't you also remove it from this function?
Done.
pkg/sql/stats/quantile_test.go line 49 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Can you remove
evalCtxfromrandHist?
Done.
pkg/sql/stats/quantile_test.go line 571 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Can you remove
evalCtxfromtoHistogram?
Done.
pkg/sql/stats/quantile_test.go line 869 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Can you just pass
nilhere?
It needs to be a typed nil, so I don't think I can.
|
TFTRs! bors r=rytaft,yuzefovich |
|
Build succeeded: |
sql/stats: use nil eval.Context as CompareContext when forecasting
When forecasting table statistics, we don't need a full *eval.Context.
We can simply use a nil *eval.Context as a tree.CompareContext. This
means we don't have to plumb an eval.Context into the stats cache.
Assists: #79872
Release note: None
sql/stats: generate statistics forecasts in the stats cache
As of this commit, we now try to generate statistics forecasts for every
column of every table. This happens whenever statistics are loaded into
or refreshed in the stats cache. We use only the forecasts that fit the
historical collected statistics very well, meaning we have high
confidence in their accuracy.
Fixes: #79872
Release note (performance improvement): Enable table statistics
forecasts, which predict future statistics based on historical collected
statistics. Forecasts help the optimizer produce better plans for
queries that read data modified after the latest statistics collection.
We use only the forecasts that fit the historical collected statistics
very well, meaning we have high confidence in their accuracy. Forecasts
can be viewed using
SHOW STATISTICS FOR TABLE ... WITH FORECAST.sql: show forecasted stats time in EXPLAIN
When using statistics forecasts, add the forecast time (which could be
in the future) to EXPLAIN output. This both indicates that forecasts are
in use, and gives us an idea of how up-to-date / ahead they are.
Assists: #79872
Release note: None
sql/opt: add tests for statistics forecasts
Add a few simple testcases for usage of statistics forecasts by the
optimizer.
Assists: #79872
Release note: None
Release justification: Enable feature before we get too far into
stability period.