Cache For Filters

**Basic Idea**

Let's suppose a query has a highly selective WHERE condition, but it does not benefit from the index. Subsequent queries use the same condition.

Let's remember the ranges in data parts where that condition was satisfied and not, in the form of the ephemeral index in memory. Subsequent queries will use this index.

**Implementation Proposal**

Add `ChunkInfo` with the information about the table, data part, and marks range, where this chunk came from.

This is applicable only for `MergeTree` tables but will also naturally work for `Merge` tables. It only works for Atomic and Replicated databases (databases that have table UUIDs).

Maintain an index data structure in memory in the form of
table -> data part -> marks range -> condition -> 0 or 1

When calculating a WHERE or PREWHERE condition, check if it is deterministic, look at the chunk info, and update the cache.

When running a query, use the cache around `MergeTreeDataSelectExecutor`.

Add settings to control cache usage on query analysis and cache update on processing.
Add a SYSTEM command to flush this cache.
Add a server setting to control the maximum size of this cache in the number of cells.

The cache is shared between users, and we don't mind side-channel information leakage (about whether another user has run a similar query recently).

**Additional Context**

It could work for external data, like S3 table functions, if we learn to use `etag`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache For Filters #67768

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cache For Filters #67768

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions