-
Notifications
You must be signed in to change notification settings - Fork 8k
[RFC] use statistic to order prewhere conditions better #53240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is an automated comment for commit e4421e2 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
91b7428 to
f9abf16
Compare
b332676 to
d6c1c0e
Compare
|
hi @hanfei1991 I am interesting with your PR. I have some questions, expecting for your answers.
|
|
|
thanks |
CurtizJ
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as a first step. Let's continue working on tasks from the backlog.
|
|
|
@hanfei1991 I am going to add more statistic types. There is a question: How to support multi types of statistics such as hyperloglog, cm-sketch in SQL? solution 2(Incompatible with the current version) solution 3 I prefer the 3td one, which do you prefer or you have a better one? |
|
the 1st one is a amendment for the 3rd one @JackyWoo |
What is statistic
an explaination from SQL Server https://learn.microsoft.com/en-us/sql/relational-databases/statistics/statistics?view=sql-server-ver16
We use
tdigestas a pratical histogramHow to create / manipulate statistic in ClickHouse
we treat statistic as a property of a table, such as
CODEC,TTL...but we store and manipulate statistic seperately, such as
INDEX,PROJECTIONcreate
manipulate
How to store it in a part
we store a single file containing all types of statistics for every column which has statistic.
and
how do we use statistic in where optimizer
prewhere conditionis likea < 5 and b > 1 and c < 4.0, then we try to re-order them accroding to the selectivitymore works need to do for this PR:
lessThanforTDigesthope that I could control the + lines within 2000 😺
what to do in the future
#55065
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
use statistic to order prewhere conditions better
Documentation entry for user-facing changes