-
Notifications
You must be signed in to change notification settings - Fork 4.1k
sql/stats: support SHOW HISTOGRAM for forecasted and merged stats #86358
Copy link
Copy link
Open
Labels
A-sql-table-statsTable statistics (and their automatic refresh).Table statistics (and their automatic refresh).C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-sql-queriesSQL Queries TeamSQL Queries TeamX-nostaleMarks an issue/pr that should be ignored by the stale botMarks an issue/pr that should be ignored by the stale bot
Description
Follow up from #79872: we currently support SHOW STATISTICS ... WITH FORECAST but the histogram_id field is always 0 for forecasts, and SHOW HISTOGRAM does not work.
For example:
CREATE TABLE h (a INT PRIMARY KEY) WITH (sql_stats_automatic_collection_enabled = false);
ALTER TABLE h INJECT STATISTICS '[
{
"avg_size": 1,
"columns": [
"a"
],
"created_at": "2022-08-13 00:00:00.000000",
"distinct_count": 3,
"histo_buckets": [
{
"distinct_range": 0,
"num_eq": 1,
"num_range": 0,
"upper_bound": "4"
},
{
"distinct_range": 0,
"num_eq": 1,
"num_range": 0,
"upper_bound": "5"
},
{
"distinct_range": 0,
"num_eq": 1,
"num_range": 0,
"upper_bound": "6"
}
],
"histo_col_type": "INT8",
"histo_version": 2,
"name": "__auto__",
"null_count": 0,
"row_count": 3
},
{
"avg_size": 1,
"columns": [
"a"
],
"created_at": "2022-08-14 00:00:00.000000",
"distinct_count": 3,
"histo_buckets": [
{
"distinct_range": 0,
"num_eq": 0,
"num_range": 0,
"upper_bound": "7"
},
{
"distinct_range": 0,
"num_eq": 1,
"num_range": 0,
"upper_bound": "8"
},
{
"distinct_range": 0,
"num_eq": 1,
"num_range": 0,
"upper_bound": "9"
}
],
"histo_col_type": "INT8",
"histo_version": 2,
"name": "__auto__",
"null_count": 0,
"row_count": 3
},
{
"avg_size": 1,
"columns": [
"a"
],
"created_at": "2022-08-15 00:00:00.000000",
"distinct_count": 3,
"histo_buckets": [
{
"distinct_range": 0,
"num_eq": 1,
"num_range": 0,
"upper_bound": "10"
},
{
"distinct_range": 0,
"num_eq": 1,
"num_range": 0,
"upper_bound": "11"
},
{
"distinct_range": 0,
"num_eq": 1,
"num_range": 0,
"upper_bound": "12"
}
],
"histo_col_type": "INT8",
"histo_version": 2,
"name": "__auto__",
"null_count": 0,
"row_count": 3
}
]';
SHOW STATISTICS FOR TABLE h WITH FORECAST;We can show histograms of the collected statistics, but not the forecasted statistics:
demo@127.0.0.1:26257/defaultdb> SHOW STATISTICS FOR TABLE h WITH FORECAST;
statistics_name | column_names | created | row_count | distinct_count | null_count | avg_size | histogram_id
------------------+--------------+---------------------+-----------+----------------+------------+----------+---------------------
__auto__ | {a} | 2022-08-13 00:00:00 | 3 | 3 | 0 | 1 | 788759343272853505
__auto__ | {a} | 2022-08-14 00:00:00 | 3 | 3 | 0 | 1 | 788759343275540481
__auto__ | {a} | 2022-08-15 00:00:00 | 3 | 3 | 0 | 1 | 788759343277932545
__forecast__ | {a} | 2022-08-16 00:00:00 | 3 | 3 | 0 | 1 | 0
(4 rows)
Time: 4ms total (execution 4ms / network 0ms)
demo@127.0.0.1:26257/defaultdb> SHOW HISTOGRAM 788759343272853505;
upper_bound | range_rows | distinct_range_rows | equal_rows
--------------+------------+---------------------+-------------
4 | 0 | 0 | 1
5 | 0 | 0 | 1
6 | 0 | 0 | 1
(3 rows)
Time: 2ms total (execution 2ms / network 0ms)
demo@127.0.0.1:26257/defaultdb> SHOW HISTOGRAM 788759343277932545;
upper_bound | range_rows | distinct_range_rows | equal_rows
--------------+------------+---------------------+-------------
10 | 0 | 0 | 1
11 | 0 | 0 | 1
12 | 0 | 0 | 1
(3 rows)
Time: 2ms total (execution 2ms / network 0ms)
demo@127.0.0.1:26257/defaultdb> SHOW HISTOGRAM 0;
ERROR: histogram 0 not found
As a workaround we can use SHOW STATISTICS USING JSON ... WITH FORECAST but it's pretty ugly:
demo@127.0.0.1:26257/defaultdb> SELECT jsonb_array_elements(stat->'histo_buckets')
FROM (
SELECT jsonb_array_elements(statistics) AS stat
FROM [SHOW STATISTICS USING JSON FOR TABLE h WITH FORECAST]
)
WHERE stat->>'name' = '__forecast__';
jsonb_array_elements
---------------------------------------------------------------------------
{"distinct_range": 0, "num_eq": 1, "num_range": 0, "upper_bound": "13"}
{"distinct_range": 0, "num_eq": 1, "num_range": 0, "upper_bound": "14"}
{"distinct_range": 0, "num_eq": 1, "num_range": 0, "upper_bound": "15"}
(3 rows)
Time: 4ms total (execution 4ms / network 0ms)
I think the trick to doing this will be to encode the table ID and column ID in the histogram_id.
Jira issue: CRDB-18713
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
A-sql-table-statsTable statistics (and their automatic refresh).Table statistics (and their automatic refresh).C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-sql-queriesSQL Queries TeamSQL Queries TeamX-nostaleMarks an issue/pr that should be ignored by the stale botMarks an issue/pr that should be ignored by the stale bot
Type
Projects
Status
Backlog