Secondary index application for MergeTree readers and early exit for order by+limit

**Use case**

Lets assume:

We have a query

``` sql
SELECT user_id FROM table WHERE message like '%abcd%' ORDER BY event_time DESC LIMIT 10
```

- There is a filter on `message` column that we can't optimize using primary or secondary index(assumption).
- `event_time` is not a part of a primary key, but we have minmax index for this column
- We can not use `event_time` skip index to filter out data to be read


**Describe the solution you'd like**

New features:

### Read from MergeTree in order of ranges based on part/granule statistics.

Allow to utilize secondary indexes in MergeTreeReader stage. Idea is to choose blocks to be read based on values of secondary index(minmax) for a column knowing `ORDER BY` for a query. It is not proposed to read in small portions i.e. making slow random reads, but in general if we will take ranges large enough for optimal read performance we can read some data first and then process next portion.

### Early exit from OrderBy+Limit using watermarks.

(Assume we have ORDER BY x DESC).
If we read in order of data ranges we can report high watermark per reader basically reporting what it the largest value in the remaining unread data ranges based on minmax index.

When Limit processor has enough rows and minimum value in the result is larger than high watermark of all readers we can guarantee that processor can exit with complete result. 

### MergeTreeReader can filter ranges based on current state of OrderBy+Limit

Similar logic can be applied to stop reading some data ranges when we have guarantees that based on PK or secondary minmax index we will never select matching rows in some parts/granules/ranges.

**Additional context**

We are adding statistics and I expect more per columns part/granule statistics will be available soon without user action.


cc: @CurtizJ 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Secondary index application for MergeTree readers and early exit for order by+limit #65990

Read from MergeTree in order of ranges based on part/granule statistics.

Early exit from OrderBy+Limit using watermarks.

MergeTreeReader can filter ranges based on current state of OrderBy+Limit

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Secondary index application for MergeTree readers and early exit for order by+limit #65990

Description

Read from MergeTree in order of ranges based on part/granule statistics.

Early exit from OrderBy+Limit using watermarks.

MergeTreeReader can filter ranges based on current state of OrderBy+Limit

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions