Skip to content

Query condition cache: introduce selectivity threshold#86076

Merged
rschu1ze merged 5 commits intoClickHouse:masterfrom
zhongyuankai:qcc_improvement
Sep 17, 2025
Merged

Query condition cache: introduce selectivity threshold#86076
rschu1ze merged 5 commits intoClickHouse:masterfrom
zhongyuankai:qcc_improvement

Conversation

@zhongyuankai
Copy link
Copy Markdown
Contributor

@zhongyuankai zhongyuankai commented Aug 23, 2025

This is a revert of #85965 which reverted #85253 because of a performance regression.

This PR improves performance by reducing lock contention.

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Added setting query_condition_cache_selectivity_threshold (default value: 1.0) which excludes scan results of predicates with low selectivity from insertion into the query condition cache. This allows to reduce the memory consumption of the query condition cache at the cost of a worse cache hit rate.

@zhongyuankai

This comment was marked as resolved.

@rschu1ze rschu1ze self-assigned this Aug 25, 2025
@rschu1ze rschu1ze added the can be tested Allows running workflows for external contributors label Aug 25, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Aug 25, 2025

Workflow [PR], commit [557014a]

Summary:

job_name test_name status info comment
Stateless tests (amd_binary, old analyzer, s3 storage, DatabaseReplicated, parallel) failure
01201_read_single_thread_in_order FAIL

@clickhouse-gh clickhouse-gh bot added the pr-improvement Pull request with some product improvements label Aug 25, 2025
@rschu1ze

This comment was marked as resolved.

@rschu1ze

This comment was marked as resolved.

@zhongyuankai zhongyuankai force-pushed the qcc_improvement branch 2 times, most recently from deb1803 to ced5a0e Compare August 27, 2025 02:27
@rschu1ze rschu1ze force-pushed the qcc_improvement branch 2 times, most recently from 47386db to e088409 Compare August 28, 2025 17:05
/// Marks potentially need to be updated. We don't know for sure because another thread could have updated them in the meantime.
/// Acquire a write lock and attempt the update.
bool is_insert = false;
if (!need_update_marks)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we need to code in l. 146-172. Could you explain a bit more? Is it really needed for correctness or performance?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. When the query QPS is very high, if the query condition cache already contains the complete result, there is no need to continue collecting matching marks. This can help avoid frequent acquire of the mutex and the cache_entry->mutex write lock, which improves performance.
  2. For queries like LIMIT, execution may terminate early before all parts are fully scanned, resulting in incomplete results in the query condition cache. This piece of code can fill in the missing marks.

@zhongyuankai zhongyuankai force-pushed the qcc_improvement branch 3 times, most recently from 9cc4598 to 1f86902 Compare August 29, 2025 06:41
@zhongyuankai

This comment was marked as resolved.

@rschu1ze

This comment was marked as resolved.

@zhongyuankai

This comment was marked as resolved.

@rschu1ze
Copy link
Copy Markdown
Member

@zhongyuankai Sorry for the delay. I have too many other things going on. I'll try to check today.

@rschu1ze
Copy link
Copy Markdown
Member

@zhongyuankai I needed to simplify the logic in QueryConditionCacheWriter::addRanges because it looked too complex and I did not understand it (in particular the new dirty bit). Please let me know if you think the rewritten logic is wrong in any way.

The PR message says:

This is a revert of #85965 which reverted #85253 because of a performance regression.

This time I wanted to be more careful with regards to performance regressions.

I ran ClickBench on a current master (on my 48 vCPUs system):

[0.019, 0.001, 0.001],
[0.102, 0.006, 0.006],
[0.045, 0.012, 0.012],
[0.719, 0.015, 0.015],
[1.600, 0.354, 0.337],
[1.785, 0.301, 0.257],
[0.014, 0.007, 0.007],
[0.022, 0.009, 0.010],
[1.502, 0.365, 0.362],
[1.932, 0.379, 0.380],
[0.672, 0.108, 0.104],
[1.265, 0.117, 0.114],
[2.474, 0.430, 0.428],
[3.478, 0.428, 0.421],
[2.130, 0.342, 0.357],
[0.952, 0.471, 0.466],
[4.193, 1.405, 1.335],
[3.365, 0.480, 0.488],
[7.972, 2.592, 2.478],
[0.094, 0.006, 0.006],
[18.434, 0.174, 0.172],
[20.875, 0.061, 0.064],
[25.084, 0.284, 0.275],
[12.371, 0.726, 0.716],
[4.528, 0.086, 0.086],
[2.214, 0.077, 0.079],
[5.191, 0.085, 0.087],
[18.534, 0.412, 0.401],
[15.191, 1.974, 2.013],
[0.058, 0.031, 0.031],
[3.679, 0.183, 0.185],
[9.791, 0.324, 0.327],
[8.869, 3.239, 3.030],
[19.144, 1.970, 1.912],
[19.125, 1.966, 1.908],
[0.298, 0.247, 0.109],
[0.060, 0.038, 0.038],
[0.037, 0.024, 0.024],
[0.040, 0.018, 0.019],
[0.085, 0.064, 0.065],
[0.032, 0.012, 0.013],
[0.029, 0.010, 0.010],
[0.023, 0.008, 0.008],

Then I ran ClickBench on this PR:

[0.034, 0.001, 0.001],
[0.144, 0.006, 0.006],
[0.044, 0.012, 0.012],
[0.743, 0.015, 0.015],
[1.598, 0.338, 0.362],
[1.784, 0.251, 0.250],
[0.015, 0.007, 0.007],
[0.027, 0.009, 0.008],
[1.540, 0.380, 0.364],
[1.969, 0.382, 0.388],
[0.673, 0.108, 0.107],
[1.307, 0.118, 0.119],
[2.452, 0.421, 0.416],
[3.472, 0.410, 0.406],
[2.114, 0.231, 0.356],
[1.019, 0.433, 0.455],
[4.170, 1.343, 1.331],
[3.350, 0.470, 0.484],
[8.086, 2.720, 2.479],
[0.085, 0.018, 0.006],
[18.447, 0.176, 0.314],
[20.752, 0.371, 0.362],
[24.601, 0.509, 0.522],
[12.495, 0.753, 0.707],
[4.560, 0.086, 0.086],
[2.238, 0.077, 0.074],
[5.170, 0.087, 0.091],
[18.588, 0.410, 0.404],
[15.200, 1.978, 1.999],
[0.066, 0.031, 0.031],
[3.666, 0.185, 0.189],
[9.781, 0.318, 0.317],
[8.817, 3.197, 3.176],
[19.146, 1.626, 1.924],
[19.113, 1.653, 1.608],
[0.154, 0.111, 0.234],
[0.061, 0.038, 0.038],
[0.043, 0.027, 0.023],
[0.041, 0.022, 0.018],
[0.091, 0.063, 0.060],
[0.038, 0.013, 0.012],
[0.029, 0.010, 0.010],
[0.025, 0.008, 0.009],

No performance deterioration.

PS: Let's try to avoid force-pushing PRs once someone reviewed them. A force-push potentially rewrites the complete commit history and the reviewer needs to start reviewing from scratch (in principle).

@zhongyuankai
Copy link
Copy Markdown
Contributor Author

@zhongyuankai I needed to simplify the logic in QueryConditionCacheWriter::addRanges because it looked too complex and I did not understand it (in particular the new dirty bit). Please let me know if you think the rewritten logic is wrong in any way.

The introduction of the dirty bit is mainly to avoid fetching results from the query cache every time. Instead, we fetch once and cache them in new_entries. If marks are updated during execution, we set the dirty flag and write them back to the query condition cache at the end. Perhaps we can first address the performance issue and then work on simplifying the code.

No performance deterioration.

I think the key to improving performance is to leverage the results already cached in the query cache to check whether the marks need to be updated. Under high QPS, the query condition cache already contains the complete results, so there is no need to write them repeatedly, which also helps avoid frequent write lock acquisitions.

If only this PR #80247 is introduced, it will not improve performance, as your tests have shown, because marks almost always need to be updated.

Perhaps you could rerun ClickBench based on this commit; my understanding is that it brings significant performance improvements.

PS: Let's try to avoid force-pushing PRs once someone reviewed them. A force-push potentially rewrites the complete commit history and the reviewer needs to start reviewing from scratch (in principle).

Okay, I understand.

@rschu1ze
Copy link
Copy Markdown
Member

@zhongyuankai Thanks for the reply.

The introduction of the dirty bit is mainly to avoid fetching results from the query cache every time.

I wonder where in the current PR (my last commit) we "fetch results from the query cache every time". Do you mean function QueryConditionCache::read?

I think the key to improving performance is to leverage the results already cached in the query cache to check whether the marks need to be updated. Under high QPS, the query condition cache already contains the complete results, so there is no need to write them repeatedly, which also helps avoid frequent write lock acquisitions.

This confuses me. As far as I understand, each query has its own QueryConditionCacheWriter. At the end of the query, QueryConditionCacheWriter::finalize is called and the new entry is persisted in the cache. What you describe makes sense if - during ::finalize, an entry with the same key exists in the cache already. That would contradict the assumption that each query has its own QueryConditionCacheWriter.

If only this PR #80247 is introduced, it will not improve performance, as your tests have shown, because marks almost always need to be updated.

The goal of this PR is not to improve performance, the goal is to not deteriorate performance over current master (which was the reason why this same PR was previously reverted). To implement further performance improvements, we can make follow-up PRs on top.

@rschu1ze rschu1ze added this pull request to the merge queue Sep 17, 2025
Merged via the queue into ClickHouse:master with commit 1f99292 Sep 17, 2025
121 of 123 checks passed
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Sep 17, 2025
@zhongyuankai
Copy link
Copy Markdown
Contributor Author

@rschu1ze Thanks for the reply.

I wonder where in the current PR (my last commit) we "fetch results from the query cache every time". Do you mean function QueryConditionCache::read?

No, I was referring to this line of code

This confuses me. As far as I understand, each query has its own QueryConditionCacheWriter. At the end of the query, QueryConditionCacheWriter::finalize is called and the new entry is persisted in the cache. What you describe makes sense if - during ::finalize, an entry with the same key exists in the cache already. That would contradict the assumption that each query has its own QueryConditionCacheWriter.

I mainly want to address the case where an entry with the same key already exists in the cache. In QueryConditionCacheWriter::addRanges, I aim to minimize the time the write lock is held. I will submit a PR for this.

This seems somewhat contradictory. Do you have a better idea?

@rschu1ze
Copy link
Copy Markdown
Member

Nevermind, let's do this in a separate PR (you can add me as reviewer). Thanks!

@rschu1ze
Copy link
Copy Markdown
Member

According to internal performance testing, this PR introduced regressions in ClickBench queries Q20 (by 2x), Q21 (by 8x) and Q22 (by 3x) despite the lockless implementation (compared to the original implementation #86076) and my performance testing (here). I have no idea what caused this and how to check deeper (perhaps try to reproduce with ClickBench again, maybe #87337 will help), but for now I need to revert this PR as it blocks the 25.9 release.

@zhongyuankai
Copy link
Copy Markdown
Contributor Author

According to internal performance testing, this PR introduced regressions in ClickBench queries Q20 (by 2x), Q21 (by 8x) and Q22 (by 3x) despite the lockless implementation (compared to the original implementation #86076) and my performance testing (here). I have no idea what caused this and how to check deeper (perhaps try to reproduce with ClickBench again, maybe #87337 will help), but for now I need to revert this PR as it blocks the 25.9 release.

I think introducing PR #87337 can resolve the performance regression.

@zhongyuankai
Copy link
Copy Markdown
Contributor Author

@rschu1ze Can we run ClickBench again based on PR #87337?

@rschu1ze
Copy link
Copy Markdown
Member

rschu1ze commented Sep 23, 2025

I think introducing PR #87337 can resolve the performance regression.

Could be. #87337 will nevertheless need some time to review and test. We can do it on top of a revert of the revert (which I am preparing right now).

@rschu1ze
Copy link
Copy Markdown
Member

@rschu1ze Can we run ClickBench again based on PR #87337?

Sure we can. What confused me is that our internal ClickBench tests showed a regression but my testing (above) did not.

@zhongyuankai
Copy link
Copy Markdown
Contributor Author

I think introducing PR #87337 can resolve the performance regression.

Could be. #87337 will nevertheless need some time to review and test. We can do it on top of a revert of the revert (which I am preparing right now).

Ok.

@alexey-milovidov
Copy link
Copy Markdown
Member

Is there any real scenario in which the memory consumption of the query condition cache is a problem?

@rschu1ze
Copy link
Copy Markdown
Member

Is there any real scenario in which the memory consumption of the query condition cache is a problem?

There is none. The real point of this PR is its new QueryConditionCacheWriter abstraction. It gets rid of dirty in-place modification of existing cache entries in QueryConditionCache::write (cache entries are supposed to be immutable).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants