-
Notifications
You must be signed in to change notification settings - Fork 8k
Open
Labels
Description
RFC and the first PR: #53240
This issue is for discussing what we will do in the future.
Use cases of column statistics:
- join reordering
- filter by decreasing selectivity in PREWHERE
- automatic low-cardinality
Related proposal: #64210
Usability
- cache of statistics
- more grammar suger
ADD STATISTIC column_name TYPE ALLto create all kinds of useful statistics as we canDROP/CLEAR/MATERIALIZE STATISTIC column_nameto drop/clear/materialize all statistics if we omitTYPE ...
- support more condition pattern for selectivity estimation
-
a between 100 and 200&&a > 100 && a < 200 -
a < 100 or a > 200
-
- system tables
- reveal statistics information in
system.partsandsystem.parts_columns
- reveal statistics information in
- support more data types
- support decimal type for tdigest
- compact statistics files into single file per part
functionality
- support
hyperloglog- then automatically decide if a string column can be stored as
low cardinalityformat
- then automatically decide if a string column can be stored as
- support
cmsketch- to estimate
a = 1
- to estimate
- support
equi-depth histograms - support heavy hitter (e.g. top 20 frequency)
- to estimate
a = 1better
- to estimate
- support min_max
- support sample: store a configurable number of values
- support more counters
- NULL values
- Default values
- Deleted values for LWD
- estimate by combinator of above statistics
- e.g.
a = 1will at first see if 1 is top 20 of column a.
- e.g.
- statistic for other tables / materialized views / projections ...
- automatically create & maintain statistic
- for cheap statistic like
hyperloglog&min_max&null_count - for frequently queried columns
- for cheap statistic like
- support statistics name aliases, e.g.
min_maxandMinMaxstatistics shall mean the same statistics type
UnamedRus, nikitamikhaylov, melvynator, JackyWoo, zhanglistar and 1 more