Query Cache

**Use case**

Provide quick answer and avoid excessive computations if the same query is requested more than once in a short period of time.

(Un)marked checkboxes refer to #43797

**Requirements**

- [x] A setting (all settings are on query level) for tuneable cache usage with three options: - don't use cache; - use cache if data exists but don't put new items into cache (passive cache usage); - use cache and put new items into it.

- [x] A setting for max item size to be put in cache. E.g. put results not bigger than 1 MB and not bigger than 100 000 records. It should not prevent streaming queries, e.g. we can stream data into cache, and discard its storage if it is too large.
  - [x] in number of records
  - [x] in number of bytes

- [x] A setting to control the max lifetime of data that will be put in cache if not invalidated earlier.

- [x] A setting to control what data in cache will be accepted: not older than specified time; not older than specified time after related tables were modified. Note that we will not control every possible modification of the related tables.

- [x] A setting to enable or disable cache sharing per user as extra security measure. If disabled, then user will be included in the cache key.

- [ ] Cache can be stored both in memory and in filesystem. There should be separate knobs to control the max size of each parts.

- [ ] Data in memory is stored in a sequence of Blocks with optional compression (like in Memory tables). Data on disk is stored in files in a dedicated directory (like `query_cache`), one file per item, named with the hash of cache key (UInt128) with control on the maximum number of files. Files are placed into subdirs by first 12bit of the hash. The files are compressed (and checksummed) as usual, with lz4 by default and an option to change it with a setting.

- [ ] Data is moved to filesystem from memory instead of eviction and put back on first access.

- [x] A setting to control should we put and use result from cache if query contains non deterministic functions like `rand` or `now`.

- [x] A setting to enable minimum number of query runs before the result will be put in cache. E.g. put result in cache only on 2nd run of the same query.

- [x] A setting to put the result in cache only if query was hard to run (time to first result block?).

- [x] Cache key should use hashed query AST or query pipeline and every setting that may affect query behavior. As it can be difficult to determine, better to simply hash all the settings.

- [x] Tables and columns access control is always performed before going to cache.

- [ ] A setting to enable or disable caching for secondary queries of distributed queries and dictionaries.

- [ ] Cache eviction strategy can be simple LRU or it can include a priority setting to roughly balance cache usage by some proportion based on priorities.

- [ ] Per user limits on total usable cache size.

- [ ] Allow to enable cache for subqueries. Allow to enable cache of data before LIMIT by speculatively fetching more records. E.g. if user run a query with LIMIT 10, we can fetch 1000 records and then get the result for subsequent query with LIMIT 10, 10.

- [ ] Synchronize concurrent queries that can reuse results of each other by stealing blocks of data that are going to be put in cache.

- [x] System commands to drop query cache.

- [ ] Invalidation of cache in the filesystem during server upgrade - not necessary.

**Additional context**

MySQL used to have query cache, but it was so terrible, that is was deprecated and finally removed.

PostgreSQL does not have query cache although there are 3rd party tools for that.

ClickHouse cache is already available in [chproxy](https://github.com/ContentSquare/chproxy)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query Cache #34011

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Query Cache #34011

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions