Implement Query Condition Cache by zhongyuankai · Pull Request #69236 · ClickHouse/ClickHouse

zhongyuankai · 2024-09-04T03:33:28Z

Implement query condition cache to improve query performance using repeated conditions. The range of the portion of data that does not meet the condition is remembered as a temporary index in memory. Subsequent queries will use this index. close #67768

Changelog category (leave one):

Not for changelog (changelog entry is not required)

robot-ch-test-poll2 · 2024-09-06T11:58:29Z

This is an automated comment for commit 55b5799 with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Check name	Description	Status
Integration tests	The integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests	❌ failure
Stateful tests	Runs stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc	❌ failure
Stateless tests	Runs stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc	❌ failure

Successful checks

Check name	Description	Status
AST fuzzer	Runs randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help	✅ success
Builds	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
ClickBench	Runs ClickBench with instant-attach table	✅ success
Compatibility check	Checks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help	✅ success
Docker keeper image	The check to build and optionally push the mentioned image to docker hub	✅ success
Docker server image	The check to build and optionally push the mentioned image to docker hub	✅ success
Docs check	Builds and tests the documentation	✅ success
Fast test	Normally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here	✅ success
Flaky tests	Checks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integration tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc	✅ success
Install packages	Checks that the built packages are installable in a clear environment	✅ success
Performance Comparison	Measure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests	✅ success
Stress test	Runs stateless functional tests concurrently from several clients to detect concurrency-related errors	✅ success
Style check	Runs a set of checks to keep the code style clean. If some of tests failed, see the related log from the report	✅ success
Unit tests	Runs the unit tests for different release types	✅ success
Upgrade check	Runs stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts	✅ success

nickitat · 2024-11-21T09:30:16Z

pls merge with master and let's run CI with use_query_condition_cache=true to make sure it doesn't cause any issues

src/Common/ProfileEvents.cpp

src/Core/Settings.cpp

src/Processors/QueryPlan/Optimizations/optimizeTree.cpp

src/Processors/QueryPlan/Optimizations/tryUpdateQueryConditionCache.cpp

tests/queries/0_stateless/03229_query_condition_cache.sql

src/Storages/MergeTree/MergeTreeSelectProcessor.cpp

zhongyuankai · 2024-11-25T12:00:27Z

@nickitat use_query_condition_cache=true will cause some tests to test the skip index to fail, https://s3.amazonaws.com/clickhouse-test-reports/69236/7858f72d3abefbfaf301f16e86b8ceaea4561c5d/fast_test.html, it seems normal, I set it to false, set to true if necessary.

nickitat · 2024-11-25T21:22:44Z

pls check crash reports, e.g. the one at the very bottom of https://s3.amazonaws.com/clickhouse-test-reports/69236/ad1b4c1d19368b4b5a23bfdacb50813c482f7d3e/stateful_tests__asan_/clickhouse-server.err.log

rschu1ze · 2025-03-03T14:28:09Z

Sorry for removing this PR from the merge queue.

@zhongyuankai Could you please take another look at my comments from 28 Nov? I suppose a less fragile implementation needs to be a lot more encapsulated (i.e. restricted to ReadFromMergeTree) + without optimizer pass.

alexey-milovidov · 2025-03-03T14:51:16Z

@rschu1ze, you didn't review this pull request since November, so I had to pick up.

zhongyuankai · 2025-03-04T08:19:34Z

@zhongyuankai Could you please take another look at my comments from 28 Nov? I suppose a less fragile implementation needs to be a lot more encapsulated (i.e. restricted to ReadFromMergeTree) + without optimizer pass.

@rschu1ze OK, I've thought about this question, but it's a bit difficult to implement, I'll read the relevant code and if it works I'll resubmit a pull request, If you have more detailed ideas please let me know, I am willing to implement it.

* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250304) * Fix ut due to ClickHouse/ClickHouse#69236 --------- Co-authored-by: kyligence-git <gluten@kyligence.io> Co-authored-by: Chang Chen <baibaichen@gmail.com>

src/Core/SettingsChangesHistory.cpp

Minor follow-up to #69236

rschu1ze · 2025-03-10T21:42:28Z

@zhongyuankai With latest master (that contains #77280 and #77293 on top of this PR), when I run

CREATE TABLE tab (a Int64, b Int64) ENGINE = MergeTree ORDER BY a;
INSERT INTO tab SELECT number, number FROM numbers(1000000);
SELECT count(*) FROM tab WHERE b = 10000 SETTINGS use_query_condition_cache = true;
SELECT count(*) FROM tab WHERE b = 10000 SETTINGS use_query_condition_cache = true;

then the second SELECT prints this into the log:

2025.03.10 21:36:53.014580 [ 1817141 ] {df969aa2-8f01-410a-8f70-7df204eeb740} <Debug> QueryConditionCache: Read entry for table_uuid: d9ef87b2-d813-4dab-ba90-b218172dd7f6, part: all_1_1_0, predicate_hash: 5456494897146899690, ranges: [1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]

Marks with values = 0 can be skipped.

One would expect something like

[0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]

Can you explain this behavior?

zhongyuankai · 2025-03-11T01:38:29Z

then the second SELECT prints this into the log:

2025.03.10 21:36:53.014580 [ 1817141 ] {df969aa2-8f01-410a-8f70-7df204eeb740} <Debug> QueryConditionCache: Read entry for table_uuid: d9ef87b2-d813-4dab-ba90-b218172dd7f6, part: all_1_1_0, predicate_hash: 5456494897146899690, ranges: [1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]

Marks with values = 0 can be skipped.

One would expect something like

[0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]

Can you explain this behavior?

@rschu1ze The reason for this behavior is that a single read call in the MergeTreeRangeReader will read multiple marks in succession, resulting in a Chunk containing multiple marks, and in the FilterTransform, the query condition cache will be updated only if all the data in the Chunk is filtered.
If you want to expect it, you have to read only one mark at a time, but this will certainly lead to performance degradation.

zhanglistar · 2025-04-07T10:09:08Z

@rschu1ze Will you update result of clickbench after including this PR? I mean, maybe result of hot run is very good.

rschu1ze · 2025-04-07T10:33:44Z

@zhanglistar Happy to do so but we should first enable the cache by default (--> PR). Before doing this, it would be nice to merge this other PR for better debugging first.

arthurpassos · 2025-08-07T00:53:26Z

src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp

+
+    if (const auto & prewhere_info = select_query_info.prewhere_info)
+    {
+        for (const auto * dag : prewhere_info->prewhere_actions.getOutputs())


What about PrewhereInfo::row_level_filter? It does not look ok to ignore it. Some marks might have been skipped because of it and the following queries that have the same prewhere condition will skip those marks even if the row policy is no longer there.

I just ran into this problem in one of my PRs: #85118. The test file can be used as a repro by removing the cache=0. Bear in mind this repro works only with a build produced by that PR

zhongyuankai added 8 commits August 23, 2024 21:18

mark filter cache

f8ff179

Complete basic functions

7210854

support analyzer

7753160

batter

6d78d37

rename

92d10a3

Refactor

dd29d53

fix

ce76c0a

Merge branch 'master' into query_condition_cache

7701ebe

tavplubix added the can be tested Allows running workflows for external contributors label Sep 6, 2024

robot-clickhouse-ci-1 added the pr-performance Pull request with some performance improvements label Sep 6, 2024

zhongyuankai added 5 commits September 7, 2024 15:21

resolve conflicts

409219b

fix

0e065a5

fix style

c9591eb

fix test

6a077b7

fix test

fc8ab23

nickitat self-assigned this Nov 21, 2024

zhongyuankai added 2 commits November 21, 2024 21:48

Merge branch 'master' into query_condition_cache

da59d70

fix test

7858f72

nickitat reviewed Nov 22, 2024

View reviewed changes

batter

708dec0

fix style

ad1b4c1

zhongyuankai force-pushed the query_condition_cache branch from 908a42a to db2e9d1 Compare November 26, 2024 02:15

fix test

ca4d835

zhongyuankai force-pushed the query_condition_cache branch from db2e9d1 to ca4d835 Compare November 26, 2024 02:31

zhongyuankai added 2 commits November 27, 2024 10:19

fix prewhere dag hash

6f68b4c

fix test

b42fa34

alexey-milovidov added this pull request to the merge queue Mar 3, 2025

rschu1ze removed this pull request from the merge queue due to a manual request Mar 3, 2025

alexey-milovidov added this pull request to the merge queue Mar 3, 2025

Merged via the queue into ClickHouse:master with commit 0066dbb Mar 3, 2025
124 checks passed

robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 3, 2025

baibaichen added a commit to Kyligence/gluten that referenced this pull request Mar 4, 2025

Fix ut due to ClickHouse/ClickHouse#69236

fe1867e

baibaichen added a commit to Kyligence/gluten that referenced this pull request Mar 4, 2025

Fix ut due to ClickHouse/ClickHouse#69236

c84bca1

zhongyuankai mentioned this pull request Mar 5, 2025

Rename QueryCache to QueryResultCache (only in C++) #77134

Merged

1 task

Algunenano reviewed Mar 5, 2025

View reviewed changes

src/Core/SettingsChangesHistory.cpp Show resolved Hide resolved

zhongyuankai mentioned this pull request Mar 5, 2025

Move use_query_condition_cachesetting to 25.3 in SettingsChangesHistory #77176

Closed

1 task

This was referenced Mar 6, 2025

Follow-up to #69236 #77280

Merged

Minor follow-up to #69236 #77293

Merged

*wip*: Enable query condition cache #77294

Closed

github-merge-queue bot pushed a commit that referenced this pull request Mar 9, 2025

Merge pull request #77293 from rschu1ze/fix-qc-cache

c81e446

Minor follow-up to #69236

rschu1ze mentioned this pull request Mar 9, 2025

Follow-up for query condition cache #77356

Merged

rschu1ze mentioned this pull request Mar 11, 2025

Revert skipping index cache #77447

Merged

1 task

zhongyuankai mentioned this pull request Mar 17, 2025

Merge marks of the same part before updating the query condition cache #77377

Merged

1 task

This was referenced Mar 17, 2025

Document query condition cache #77743

Merged

Add <query_condition_cache_size> to config.xml #77846

Merged

arthurpassos reviewed Aug 7, 2025

View reviewed changes

alesapin mentioned this pull request Oct 28, 2025

Implement Query Condition Cache for Iceberg #89104

Open

Conversation

zhongyuankai commented Sep 4, 2024 • edited by rschu1ze Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Uh oh!

robot-ch-test-poll2 commented Sep 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nickitat commented Nov 21, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhongyuankai commented Nov 25, 2024

Uh oh!

nickitat commented Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

rschu1ze commented Mar 3, 2025

Uh oh!

alexey-milovidov commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

zhongyuankai commented Mar 4, 2025

Uh oh!

Uh oh!

rschu1ze commented Mar 10, 2025

Uh oh!

zhongyuankai commented Mar 11, 2025

Uh oh!

zhanglistar commented Apr 7, 2025

Uh oh!

rschu1ze commented Apr 7, 2025

Uh oh!

arthurpassos Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

zhongyuankai commented Sep 4, 2024 •

edited by rschu1ze

Loading

robot-ch-test-poll2 commented Sep 6, 2024 •

edited

Loading

nickitat commented Nov 25, 2024 •

edited

Loading

alexey-milovidov commented Mar 3, 2025 •

edited

Loading