Finer-grained subquery cache propagation control#99804
Open
vvo wants to merge 72 commits intoClickHouse:masterfrom
Open
Finer-grained subquery cache propagation control#99804vvo wants to merge 72 commits intoClickHouse:masterfrom
vvo wants to merge 72 commits intoClickHouse:masterfrom
Conversation
…o query_cache_for_subqueries
…o query_cache_for_subqueries
…ckHouse into query_cache_for_subqueries
Add null guard for `query_result_cache` in `Planner::buildPlanForQueryNode` to prevent crash when a subquery explicitly sets `use_query_cache = true` but the server has no query cache configured. Remove spurious duplicate `optimize_and_compare_chain` entry in `SettingsChangesHistory.cpp` that was introduced during rebase. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
`finalizeWriteInQueryResultCache` is a no-op when the pipeline has no cache writers, so the conditional checks are unnecessary. Making the calls unconditional is consistent with `executeQuery.cpp`'s `finalizeQueryPipelineBeforeLogging` which always calls it, and is more robust for future changes. Remove the now-unused `shouldCacheScalarSubquery` function and `query_cache_for_subqueries` extern declarations from both `ExecuteScalarSubqueriesVisitor.cpp` and `PreparedSets.cpp`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add null guard for `query_result_cache` in `Planner::buildPlanForQueryNode` to prevent crash when a subquery explicitly sets `use_query_cache = true` but the server has no query cache configured. Remove spurious duplicate `optimize_and_compare_chain` entry in `SettingsChangesHistory.cpp` that was introduced during rebase. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
`finalizeWriteInQueryResultCache` is a no-op when the pipeline has no cache writers, so the conditional checks are unnecessary. Making the calls unconditional is consistent with `executeQuery.cpp`'s `finalizeQueryPipelineBeforeLogging` which always calls it, and is more robust for future changes. Remove the now-unused `shouldCacheScalarSubquery` function and `query_cache_for_subqueries` extern declarations from both `ExecuteScalarSubqueriesVisitor.cpp` and `PreparedSets.cpp`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Eliminate double AST clone in `QueryResultCache::Key` constructor: introduce `calculateAstHashAndQueryString` that clones once for subqueries and computes both hash and query string from the result. Add `already_cloned` parameter to `calculateAstHash` to skip its internal clone when called with a pre-cloned AST. - Guard `extremes`/`max_result_bytes`/`max_result_rows` default normalization with `if (is_subquery)` so non-subquery cache entries are not affected (avoids hash changes and cache invalidation on upgrade for existing non-subquery entries). - Add tests for CTE subquery caching, UNION ALL subquery caching, explicit per-subquery opt-in with cache hit verification, old analyzer negative test, and `query_cache_min_query_runs` with subqueries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rence - Add query_cache_system_table_handling = 'save' to union subquery test (UNION ALL with literals triggers system table check on CI) - Fix min_runs reference to match CI output (cache hit counts differ between local Release build and CI coverage build) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The description said results are "retrieved" from cache, but this setting enables both read and write paths for subquery caching. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The AI reviewer correctly identified that `setCanUseQueryResultCache(true)` on the planner context leaks subquery opt-in into the shared query state, which could cause outer query results to be cached even when the outer query didn't set `use_query_cache`. Fix: instead of mutating the shared context, add a `skip_context_check` parameter to `checkCanWriteQueryResultCache` that bypasses the context flag check while still enforcing safety checks (nondeterministic functions, system tables). The subquery cache eligibility is now tracked locally via `local_can_use_cache` without mutating global state. Also adds Test 14: regression test proving that explicit subquery opt-in does NOT create a top-level cache entry when outer `use_query_cache = 0`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Upstream master bumped to 26.4, so `query_cache_for_subqueries` setting history entry must be in the 26.4 section (was in 26.3 which is now closed). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Key's operator== and KeyHasher only used ast_hash, which could cause collisions between top-level and subquery entries with the same AST. Now is_subquery is included in both equality comparison and hash computation, ensuring separate cache domains. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Revert contrib/NuRaft, contrib/libstemmer_c, contrib/rust_vendor and .gitignore to upstream master — these were accidentally included from the upstream merge and are not part of this feature. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Initialize `chunk_type` in `StreamInQueryResultCacheStep` to prevent potential use of uninitialized variable if `StreamType` enum is extended - Fix `system.query_cache` column description for `is_subquery` - Fix minor grammar in comment - Add missing trailing newlines in test files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolve conflict in Planner.cpp: keep the local_can_use_cache approach (from reviewer feedback) instead of mutating the shared query context. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
|
@rschu1ze @alexey-milovidov I think we're done here, let me know! |
Multiple StreamInQueryResultCacheTransform instances share the same writer. The first finalizeWrite call processes all buffered data; subsequent calls are correct no-ops. Early-exit rejections are intentional and should not be retried. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
LLVM Coverage Report
PR changed lines: PR changed-lines coverage: 96.72% (413/427, 0 noise lines excluded) |
Author
|
@rschu1ze @alexey-milovidov do we still want this? |
Member
|
Yes. Haven't had time to check yet, sorry. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements finer-grained subquery cache propagation control, as requested in the review of #76252 by @alexey-milovidov.
use_query_cacheon the outer query no longer auto-propagates to subqueries. Three rules implemented inshouldUseQueryCacheForSubquery:use_query_cacheon the outer query does NOT propagate to subqueriesSELECT * FROM (SELECT ... SETTINGS use_query_cache = true)enables caching for that specific subqueryquery_cache_for_subqueries = trueenables propagation into all subqueriesAlso fixes two issues from #76252:
QueryCache→QueryResultCachechanged keys inContext.cppbut not in config files, breakingSYSTEM RELOAD CONFIGfor cache settings03381_*tests now explicitlySET enable_analyzer = 1since Planner-level subquery caching is new-analyzer onlyChangelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Individual subquery results can now be cached independently using
SETTINGS use_query_cache = trueon specific subqueries, without caching the entire outer query. A new settingquery_cache_for_subqueries = trueenables bulk propagation ofuse_query_cacheinto all subqueries. Note:use_query_cacheon the outer query no longer auto-propagates to subqueries.Documentation entry for user-facing changes
A new "Subquery Caching" section has been added to the query cache documentation explaining the three propagation modes with examples.