Filtering by secondary indices should be performed on data reading

### Company or project name

ClickHouse

### Describe the situation

Currently, index analysis is done before the query starts reading the data.
It determines which granules to read by filtering out unnecessary granules.

This has the following disadvantages:
1. Even if a query will finish earlier due to LIMIT, a larger amount of the index has to be read.
2. Analysing index produces a visible delay before query processing: the progress bar does not show instantly.
3. Sometimes, reading indices can be heavier than reading the data, but we cannot make the decision of not reading an index dynamically.
4. The order in which different indices are processed cannot be adapted dynamically.

It also has some advantages, but all of them do not really matter:
1. It better estimates the number of records to read, and the pipeline (such as the number of processing lanes) can be adapted from this information.
2. EXPLAIN ESTIMATE shows more precise results but takes more time.



### Solution

When we are going to read a granule, check if there are index granules covering it. Read index granules first, and skip this granule if necessary.

We can collect statistics about how much time it takes to read index granules, and how large portions of the data every index filtered. These statistics can be used to avoid reading certain indices at all, and reading other indices in the optimal order.

These statistics can live only during the query run, there is no need to share them.

### Additional context

Probably index analysis also affects the choice of projections, but we don't want it to be too heavy. If it affects join order - we should better estimate it from runtime statistics rather than requiring index analysis in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filtering by secondary indices should be performed on data reading #75774

Company or project name

Describe the situation

Solution

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Filtering by secondary indices should be performed on data reading #75774

Description

Company or project name

Describe the situation

Solution

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions