Implement Query Condition Cache#69236
Conversation
|
This is an automated comment for commit 55b5799 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
|
pls merge with master and let's run CI with |
src/Processors/QueryPlan/Optimizations/tryUpdateQueryConditionCache.cpp
Outdated
Show resolved
Hide resolved
src/Processors/QueryPlan/Optimizations/tryUpdateQueryConditionCache.cpp
Outdated
Show resolved
Hide resolved
|
@nickitat |
|
pls check crash reports, e.g. the one at the very bottom of https://s3.amazonaws.com/clickhouse-test-reports/69236/ad1b4c1d19368b4b5a23bfdacb50813c482f7d3e/stateful_tests__asan_/clickhouse-server.err.log |
908a42a to
db2e9d1
Compare
db2e9d1 to
ca4d835
Compare
|
Sorry for removing this PR from the merge queue. @zhongyuankai Could you please take another look at my comments from 28 Nov? I suppose a less fragile implementation needs to be a lot more encapsulated (i.e. restricted to ReadFromMergeTree) + without optimizer pass. |
|
@rschu1ze, you didn't review this pull request since November, so I had to pick up. |
@rschu1ze OK, I've thought about this question, but it's a bit difficult to implement, I'll read the relevant code and if it works I'll resubmit a pull request, If you have more detailed ideas please let me know, I am willing to implement it. |
* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250304) * Fix ut due to ClickHouse/ClickHouse#69236 --------- Co-authored-by: kyligence-git <gluten@kyligence.io> Co-authored-by: Chang Chen <baibaichen@gmail.com>
Minor follow-up to #69236
|
@zhongyuankai With latest CREATE TABLE tab (a Int64, b Int64) ENGINE = MergeTree ORDER BY a;
INSERT INTO tab SELECT number, number FROM numbers(1000000);
SELECT count(*) FROM tab WHERE b = 10000 SETTINGS use_query_condition_cache = true;
SELECT count(*) FROM tab WHERE b = 10000 SETTINGS use_query_condition_cache = true;then the second Marks with values = One would expect something like Can you explain this behavior? |
@rschu1ze The reason for this behavior is that a single |
|
@rschu1ze Will you update result of clickbench after including this PR? I mean, maybe result of hot run is very good. |
|
@zhanglistar Happy to do so but we should first enable the cache by default (--> PR). Before doing this, it would be nice to merge this other PR for better debugging first. |
|
|
||
| if (const auto & prewhere_info = select_query_info.prewhere_info) | ||
| { | ||
| for (const auto * dag : prewhere_info->prewhere_actions.getOutputs()) |
There was a problem hiding this comment.
What about PrewhereInfo::row_level_filter? It does not look ok to ignore it. Some marks might have been skipped because of it and the following queries that have the same prewhere condition will skip those marks even if the row policy is no longer there.
I just ran into this problem in one of my PRs: #85118. The test file can be used as a repro by removing the cache=0. Bear in mind this repro works only with a build produced by that PR
Implement query condition cache to improve query performance using repeated conditions. The range of the portion of data that does not meet the condition is remembered as a temporary index in memory. Subsequent queries will use this index. close #67768
Changelog category (leave one):