-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Secondary index application for MergeTree readers and early exit for order by+limit #65990
Description
Use case
Lets assume:
We have a query
SELECT user_id FROM table WHERE message like '%abcd%' ORDER BY event_time DESC LIMIT 10- There is a filter on
messagecolumn that we can't optimize using primary or secondary index(assumption). event_timeis not a part of a primary key, but we have minmax index for this column- We can not use
event_timeskip index to filter out data to be read
Describe the solution you'd like
New features:
Read from MergeTree in order of ranges based on part/granule statistics.
Allow to utilize secondary indexes in MergeTreeReader stage. Idea is to choose blocks to be read based on values of secondary index(minmax) for a column knowing ORDER BY for a query. It is not proposed to read in small portions i.e. making slow random reads, but in general if we will take ranges large enough for optimal read performance we can read some data first and then process next portion.
Early exit from OrderBy+Limit using watermarks.
(Assume we have ORDER BY x DESC).
If we read in order of data ranges we can report high watermark per reader basically reporting what it the largest value in the remaining unread data ranges based on minmax index.
When Limit processor has enough rows and minimum value in the result is larger than high watermark of all readers we can guarantee that processor can exit with complete result.
MergeTreeReader can filter ranges based on current state of OrderBy+Limit
Similar logic can be applied to stop reading some data ranges when we have guarantees that based on PK or secondary minmax index we will never select matching rows in some parts/granules/ranges.
Additional context
We are adding statistics and I expect more per columns part/granule statistics will be available soon without user action.
cc: @CurtizJ