Skip to content

Finer-grained subquery cache propagation control#99804

Open
vvo wants to merge 72 commits intoClickHouse:masterfrom
vvo:subquery-cache-propagation
Open

Finer-grained subquery cache propagation control#99804
vvo wants to merge 72 commits intoClickHouse:masterfrom
vvo:subquery-cache-propagation

Conversation

@vvo
Copy link
Copy Markdown

@vvo vvo commented Mar 17, 2026

Implements finer-grained subquery cache propagation control, as requested in the review of #76252 by @alexey-milovidov.

use_query_cache on the outer query no longer auto-propagates to subqueries. Three rules implemented in shouldUseQueryCacheForSubquery:

  1. No auto-propagation by defaultuse_query_cache on the outer query does NOT propagate to subqueries
  2. Explicit per-subquery opt-inSELECT * FROM (SELECT ... SETTINGS use_query_cache = true) enables caching for that specific subquery
  3. Bulk propagationquery_cache_for_subqueries = true enables propagation into all subqueries

Also fixes two issues from #76252:

  • Config key mismatch: the rename from QueryCacheQueryResultCache changed keys in Context.cpp but not in config files, breaking SYSTEM RELOAD CONFIG for cache settings
  • All 03381_* tests now explicitly SET enable_analyzer = 1 since Planner-level subquery caching is new-analyzer only

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Individual subquery results can now be cached independently using SETTINGS use_query_cache = true on specific subqueries, without caching the entire outer query. A new setting query_cache_for_subqueries = true enables bulk propagation of use_query_cache into all subqueries. Note: use_query_cache on the outer query no longer auto-propagates to subqueries.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

A new "Subquery Caching" section has been added to the query cache documentation explaining the three propagation modes with examples.

nbarannik and others added 30 commits January 14, 2025 11:19
vvo and others added 5 commits March 19, 2026 08:39
Add null guard for `query_result_cache` in `Planner::buildPlanForQueryNode`
to prevent crash when a subquery explicitly sets `use_query_cache = true`
but the server has no query cache configured.

Remove spurious duplicate `optimize_and_compare_chain` entry in
`SettingsChangesHistory.cpp` that was introduced during rebase.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
`finalizeWriteInQueryResultCache` is a no-op when the pipeline has no
cache writers, so the conditional checks are unnecessary. Making the
calls unconditional is consistent with `executeQuery.cpp`'s
`finalizeQueryPipelineBeforeLogging` which always calls it, and is
more robust for future changes.

Remove the now-unused `shouldCacheScalarSubquery` function and
`query_cache_for_subqueries` extern declarations from both
`ExecuteScalarSubqueriesVisitor.cpp` and `PreparedSets.cpp`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add null guard for `query_result_cache` in `Planner::buildPlanForQueryNode`
to prevent crash when a subquery explicitly sets `use_query_cache = true`
but the server has no query cache configured.

Remove spurious duplicate `optimize_and_compare_chain` entry in
`SettingsChangesHistory.cpp` that was introduced during rebase.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
`finalizeWriteInQueryResultCache` is a no-op when the pipeline has no
cache writers, so the conditional checks are unnecessary. Making the
calls unconditional is consistent with `executeQuery.cpp`'s
`finalizeQueryPipelineBeforeLogging` which always calls it, and is
more robust for future changes.

Remove the now-unused `shouldCacheScalarSubquery` function and
`query_cache_for_subqueries` extern declarations from both
`ExecuteScalarSubqueriesVisitor.cpp` and `PreparedSets.cpp`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Eliminate double AST clone in `QueryResultCache::Key` constructor:
  introduce `calculateAstHashAndQueryString` that clones once for
  subqueries and computes both hash and query string from the result.
  Add `already_cloned` parameter to `calculateAstHash` to skip its
  internal clone when called with a pre-cloned AST.

- Guard `extremes`/`max_result_bytes`/`max_result_rows` default
  normalization with `if (is_subquery)` so non-subquery cache entries
  are not affected (avoids hash changes and cache invalidation on
  upgrade for existing non-subquery entries).

- Add tests for CTE subquery caching, UNION ALL subquery caching,
  explicit per-subquery opt-in with cache hit verification, old
  analyzer negative test, and `query_cache_min_query_runs` with
  subqueries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rence

- Add query_cache_system_table_handling = 'save' to union subquery test
  (UNION ALL with literals triggers system table check on CI)
- Fix min_runs reference to match CI output (cache hit counts differ
  between local Release build and CI coverage build)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
vvo and others added 3 commits March 19, 2026 13:25
The description said results are "retrieved" from cache, but this
setting enables both read and write paths for subquery caching.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The AI reviewer correctly identified that `setCanUseQueryResultCache(true)`
on the planner context leaks subquery opt-in into the shared query state,
which could cause outer query results to be cached even when the outer
query didn't set `use_query_cache`.

Fix: instead of mutating the shared context, add a `skip_context_check`
parameter to `checkCanWriteQueryResultCache` that bypasses the context
flag check while still enforcing safety checks (nondeterministic
functions, system tables). The subquery cache eligibility is now tracked
locally via `local_can_use_cache` without mutating global state.

Also adds Test 14: regression test proving that explicit subquery opt-in
does NOT create a top-level cache entry when outer `use_query_cache = 0`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
vvo and others added 2 commits March 20, 2026 00:03
Upstream master bumped to 26.4, so `query_cache_for_subqueries` setting
history entry must be in the 26.4 section (was in 26.3 which is now closed).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh clickhouse-gh bot added the manual approve Manual approve required to run CI label Mar 19, 2026
vvo and others added 2 commits March 20, 2026 00:04
@rschu1ze rschu1ze self-assigned this Mar 19, 2026
vvo and others added 4 commits March 20, 2026 00:40
The Key's operator== and KeyHasher only used ast_hash, which could cause
collisions between top-level and subquery entries with the same AST.
Now is_subquery is included in both equality comparison and hash
computation, ensuring separate cache domains.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Revert contrib/NuRaft, contrib/libstemmer_c, contrib/rust_vendor and
.gitignore to upstream master — these were accidentally included from
the upstream merge and are not part of this feature.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Initialize `chunk_type` in `StreamInQueryResultCacheStep` to prevent
  potential use of uninitialized variable if `StreamType` enum is extended
- Fix `system.query_cache` column description for `is_subquery`
- Fix minor grammar in comment
- Add missing trailing newlines in test files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolve conflict in Planner.cpp: keep the local_can_use_cache
approach (from reviewer feedback) instead of mutating the shared
query context.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vvo
Copy link
Copy Markdown
Author

vvo commented Mar 20, 2026

@rschu1ze @alexey-milovidov I think we're done here, let me know!

vvo and others added 2 commits March 20, 2026 12:20
Multiple StreamInQueryResultCacheTransform instances share the same
writer. The first finalizeWrite call processes all buffered data;
subsequent calls are correct no-ops. Early-exit rejections are
intentional and should not be retried.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 24, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 83.80% -0.30%
Functions 24.50% 24.50% +0.00%
Branches 76.70% 76.30% -0.40%

PR changed lines: PR changed-lines coverage: 96.72% (413/427, 0 noise lines excluded)
Diff coverage report
Uncovered code

@vvo
Copy link
Copy Markdown
Author

vvo commented Apr 1, 2026

@rschu1ze @alexey-milovidov do we still want this?

@rschu1ze
Copy link
Copy Markdown
Member

rschu1ze commented Apr 1, 2026

Yes. Haven't had time to check yet, sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors manual approve Manual approve required to run CI pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants