Skip to content

Support Materialized CTE#94849

Merged
KochetovNicolai merged 74 commits intomasterfrom
materialzied_cte
Mar 21, 2026
Merged

Support Materialized CTE#94849
KochetovNicolai merged 74 commits intomasterfrom
materialzied_cte

Conversation

@novikd
Copy link
Copy Markdown
Member

@novikd novikd commented Jan 22, 2026

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Support Materialized CTE. Allow evaluating CTEs only once during query execution and store their results in temporary tables. Closes #53449.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

This PR used one commit and some test queries written by @canhld94 in #61086. Closes #61086.


Note

High Risk
Touches core analyzer/planner/optimizer and execution pipeline to introduce new temp-table materialization paths and a new setting, which can affect query correctness/performance and distributed/parallel replicas behavior.

Overview
Adds experimental materialized CTEs via WITH name AS MATERIALIZED (subquery), guarded by new setting enable_materialized_cte.

The analyzer/planner now tracks is_materialized on QueryNode/UnionNode, parses/formats the new syntax, resolves materialized CTE identifiers by converting them to TableNodes backed by temporary StorageMemory, forbids correlated materialized CTEs, and errors when combined with WITH RECURSIVE.

Query planning/execution gains CTE materialization steps (MaterializingCTETransform, MaterializingCTEStep, delayed planning + optimization pass) to run CTE subplans once before the main query, coordinate reuse across sets/PK analysis/distributed queries, and inline single-use materialized CTEs back to avoid external-table conflicts; adds tests and docs for behavior and edge cases.

Written by Cursor Bugbot for commit 1753b65. This will update automatically on new commits. Configure here.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Jan 22, 2026

Workflow [PR], commit [8457ed6]

Summary:

job_name test_name status info comment
AST fuzzer (arm_asan_ubsan) failure
AddressSanitizer: stack-use-after-scope (STID: 2136-3203) FAIL cidb IGNORED

AI Review

Summary

This PR adds experimental MATERIALIZED CTE support with planner/analyzer/pipeline integration and broad test coverage. I found one additional correctness issue not yet commented inline by clickhouse-gh[bot], plus two existing bot-reported issues (one blocker and one typo/doc nit). Overall verdict: request changes until the two correctness issues are fixed.

Findings

  • ❌ Blockers

    • [src/Analyzer/QueryNode.cpp:353,400] isEqualImpl ignores is_materialized when options.ignore_cte = true, but updateTreeHashImpl always hashes is_materialized. This violates the hash/equality contract and can break hash-based containers.
    • Suggested fix: include/exclude is_materialized in hash under the same ignore_cte condition as equality.
  • ⚠️ Majors

    • [src/Analyzer/inlineMaterializedCTEIfNeeded.cpp:38] removeExternalTable is called through getQueryContext() without checking hasQueryContext(). In contexts where query context is absent, this throws THERE_IS_NO_QUERY.
    • Suggested fix: guard removeExternalTable by hasQueryContext() (or skip that removal path when there is no query context).
  • 💡 Nits

    • [docs/en/sql-reference/statements/select/with.md:19] Typo: use its instead of it's (possessive).

ClickHouse Rules

Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny ⚠️ Analyzer/planner core paths have unresolved correctness issue in context handling.
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
Safe rollout ⚠️ Rollout risk remains until correctness issues are fixed.
Compilation time

Final Verdict

  • Status: ⚠️ Request changes
  • Minimum required actions:
    • Fix hash/equality contract mismatch for QueryNode materialized-CTE flag.
    • Fix inlineMaterializedCTEIfNeeded to avoid calling getQueryContext() when there is no query context.

@clickhouse-gh clickhouse-gh bot added the pr-feature Pull request with new product feature label Jan 22, 2026
Prewhere filter
Prewhere filter column: greaterOrEquals(a, 0)
Filter column: and(indexHint(less(i, 100)), less(multiply(2, b), 100)) (removed)
Filter column: and(less(multiply(2, b), 100), indexHint(less(i, 100))) (removed)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of conditions somehow depends on the hash value during conversion to CNF. I think it's okay because it reverts the change from #82617.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the order changes all the time, probably not for the last time then.

@novikd novikd marked this pull request as ready for review January 30, 2026 17:10
@Fgrtue Fgrtue self-assigned this Feb 2, 2026
@novikd
Copy link
Copy Markdown
Member Author

novikd commented Mar 20, 2026

Fix for Logical error: Unexpected return type from A. Expected B. Got C. Action: (STID: 1611-34f1): #100182

@alexey-milovidov
Copy link
Copy Markdown
Member

Inconsistent AST formatting - fixed in #100174

@novikd novikd added this pull request to the merge queue Mar 20, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 20, 2026
/// If table is not removed from Context, it would cause an exception in distributed queries,
/// because the temporary table would already exist in the Context
/// and there would be an attempt to read it to send the temporary table to remote servers.
getContext()->getQueryContext()->removeExternalTable(table_node->getTemporaryTableName());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inlineMaterializedCTEIfNeeded clears reused_materialized_cte when context->hasQueryContext() is false, but InlineMaterializedCTEsVisitor::enterImpl still unconditionally calls getContext()->getQueryContext()->removeExternalTable(...) for non-reused CTEs.

That throws THERE_IS_NO_QUERY in analyzer-only contexts and can fail queries that use MATERIALIZED CTEs without a query context.

Please guard this call with hasQueryContext() (or skip external-table removal entirely when there is no query context).

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 20, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 83.70% 83.80% +0.10%
Functions 24.00% 24.00% +0.00%
Branches 76.30% 76.40% +0.10%

PR changed lines: PR changed-lines coverage: 94.19% (924/981, 6 noise lines excluded)
Diff coverage report
Uncovered code

@KochetovNicolai KochetovNicolai added this pull request to the merge queue Mar 21, 2026
Merged via the queue into master with commit 2a440a2 Mar 21, 2026
150 of 152 checks passed
@KochetovNicolai KochetovNicolai deleted the materialzied_cte branch March 21, 2026 17:55
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 21, 2026
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Mar 21, 2026
robot-ch-test-poll3 added a commit that referenced this pull request Mar 22, 2026
Cherry pick #94849 to 26.3: Support Materialized CTE
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Mar 22, 2026
clickhouse-gh bot added a commit that referenced this pull request Mar 22, 2026
Backport #94849 to 26.3: Support Materialized CTE
@nikitamikhaylov nikitamikhaylov added the post-approved Approved, but after the PR is merged. label Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

post-approved Approved, but after the PR is merged. pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-feature Pull request with new product feature pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo v26.3-must-backport

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WITH cte AS [NOT] MATERIALIZED (SELECT ...) SELECT * FROM cte