Skip to content

opt: predict future statistics based on historical stats #79872

@rytaft

Description

@rytaft

Is your feature request related to a problem? Please describe.
Currently, we only support collecting table statistics on the entire table, and statistics are only automatically refreshed when ~20% of rows have changed. This is problematic for very large tables where only a portion of the table is regularly updated or queried. As stats become more stale, there is greater likelihood that the optimizer will not choose optimal query plans.

For example, if only the rows most recently inserted into a table are regularly queried, stats on these rows will often be stale. This also gets worse over time: As the table increases in size, the 20% trigger for automatic refreshes will happen less and less frequently, and therefore stats on the recent rows will become more and more stale.

Describe the solution you'd like
We could take advantage of the fact that we store 4-5 historical stats for every column. We could use these historical stats to build a simple regression model (or a more complex model) to predict how the stats have changed since they were last collected. Predictions are only possible for column types where a rate of change can be determined between two values, such as DATE, TIME[TZ], TIMESTAMP[TZ], INT[2|4|8] , FLOAT[4|8], and DECIMAL. This prediction would not be stored on disk, but instead would be calculated on the fly, either inside the stats cache or the statisticsBuilder.

Describe alternatives you've considered
There are a number of alternatives described in #75625 (this proposal is one of them). That RFC proposes a more comprehensive solution, but it is heavier weight.

Additional context
This solution is already in progress in #77070.

Epic CRDB-13963

Jira issue: CRDB-15884

Metadata

Metadata

Assignees

Labels

A-sql-table-statsTable statistics (and their automatic refresh).C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-sql-queriesSQL Queries Team

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions