Properly remove unneeded columns for lazy materialization. by KochetovNicolai · Pull Request #96682 · ClickHouse/ClickHouse

KochetovNicolai · 2026-02-11T18:18:01Z

Changelog category (leave one):

Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix Block structure mismatch in stream error caused by unnecessary columns returned from Lazy materialization.
Fixes #95191

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

Note

Medium Risk
Touches lazy materialization query-plan splitting and required-column propagation, which can affect correctness of query results and headers across filters/expressions. Changes are localized but involve non-trivial DAG dependency and header-position accounting.

Overview
Fixes lazy materialization planning so the main branch only forwards columns that are actually required/available, preventing extra columns (e.g. from PREWHERE) from polluting downstream headers and causing Block structure mismatch (notably with parallel replicas).

This refactors required-column tracking to distinguish DAG-output requirements vs step-input header positions, adds transitive dependency inclusion (addRequiredInputDependenciesIntoNodesSet), and updates splitExpressionStep/splitFilterStep to compute the next step’s available inputs (including handling of removed filter columns) instead of relying on a final projection cleanup. Adds a stateless regression test covering the parallel-replicas failure case.

^{Written by Cursor Bugbot for commit d7f724b. This will update automatically on new commits. Configure here.}

clickhouse-gh · 2026-02-11T18:18:40Z

Workflow [PR], commit [7698acd]

Summary: ⏳

job_name	test_name	status	info	comment
Bugfix validation (functional tests)		failure
	03814_lazy_materialization_unneeded_columns_bug	FAIL	cidb

AI Review

Summary

This PR fixes lazy materialization column propagation by tracking required DAG outputs separately from step-input availability, then splitting ExpressionStep/FilterStep accordingly. The change addresses the reported Block structure mismatch in stream scenario and includes a targeted stateless regression test. I did not find new correctness, safety, concurrency, or performance issues beyond already-reported bot typo comments.

Missing context

⚠️ No CI logs or benchmark artifacts were provided in the review context, so validation is limited to static diff analysis and test intent.

ClickHouse Rules

Item	Status	Notes
Deletion logging	➖
Serialization versioning	➖
Core-area scrutiny	✅
No test removal	✅
Experimental gate	➖
No magic constants	✅
Backward compatibility	✅
`SettingsChangesHistory.cpp`	➖
PR metadata quality	✅
Safe rollout	✅
Compilation time	✅

Final Verdict

Status: ✅ Approve

…WHERE and parallel replicas" This reverts commit 10f0f02.

Copilot

Pull request overview

This PR fixes a Block structure mismatch in stream error caused by lazy materialization returning unnecessary columns. The issue occurred when PREWHERE added extra columns that weren't consumed by subsequent operations, particularly in parallel replica scenarios.

Changes:

Refactored the lazy materialization optimization to properly track and remove unneeded columns
Introduced available_input_positions to track which columns are actually available after filtering
Added logic to transitively determine required dependencies in the ActionsDAG

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 6 comments.

File	Description
tests/queries/0_stateless/03814_lazy_materialization_unneded_columns_bug.sql	Adds regression test for the lazy materialization bug with parallel replicas
src/Processors/QueryPlan/Optimizations/optimizeLazyMaterialization.cpp	Refactors column tracking logic to properly handle available vs required columns and removes workaround projection step

Copilot · 2026-02-18T13:05:02Z

tests/queries/0_stateless/03814_lazy_materialization_unneeded_columns_bug.sql

+SELECT length(thread_ids) > 0
+FROM system.query_log
+WHERE (current_database = currentDatabase()) AND (event_date >= (today() - 1)) AND (lower(query) LIKE '%abcd%') AND (type = 'QueryFinish')
+ORDER BY query_start_time DESC
+LIMIT 1
+SETTINGS enable_parallel_replicas = 1, query_plan_optimize_lazy_materialization = 1, max_parallel_replicas = 2, cluster_for_parallel_replicas = 'parallel_replicas', parallel_replicas_for_non_replicated_merge_tree = 1


Corrected spelling of 'unneded' to 'unneeded' in filename.

Copilot · 2026-02-18T13:05:02Z

src/Processors/QueryPlan/Optimizations/optimizeLazyMaterialization.cpp

+    }
+}
+
+/// requred_outputs are outputs of ActionsDAG, however required_inputs are inputs corresponding to the step input header.


Corrected spelling of 'requred' to 'required'.

Suggested change

/// requred_outputs are outputs of ActionsDAG, however required_inputs are inputs corresponding to the step input header.

/// required_outputs are outputs of ActionsDAG, however required_inputs are inputs corresponding to the step input header.

Copilot · 2026-02-18T13:05:02Z

src/Processors/QueryPlan/Optimizations/optimizeLazyMaterialization.cpp

+    // std::cerr << "split nodes: " << split_nodes.size() << std::endl;
+    // for (const auto * node : split_nodes)
+    //     std::cerr << "  " << node->result_name << std::endl;
+


Remove commented-out debugging code before merging to production.

Suggested change

// std::cerr << "split nodes: " << split_nodes.size() << std::endl;

// for (const auto * node : split_nodes)

// std::cerr << " " << node->result_name << std::endl;

Copilot · 2026-02-18T13:05:03Z

src/Processors/QueryPlan/Optimizations/optimizeLazyMaterialization.cpp

+        // std::cerr << "::: req columns (" << required_columns.size() << ") [";
+        // for (auto && required_column : required_columns)
+        //     std::cerr << required_column << " ";
+        // std::cerr << "]\n";


Remove commented-out debugging code before merging to production.

Suggested change

// std::cerr << "::: req columns (" << required_columns.size() << ") [";

// for (auto && required_column : required_columns)

// std::cerr << required_column << " ";

// std::cerr << "]\n";

Copilot · 2026-02-18T13:05:03Z

src/Processors/QueryPlan/Optimizations/optimizeLazyMaterialization.cpp

+        // std::cerr << ".. Main header " << read_from_merge_tree->getOutputHeader()->dumpNames() << std::endl;
+        // std::cerr << ".. Lazy header " << lazy_reading->getOutputHeader()->dumpNames() << std::endl;


Remove commented-out debugging code before merging to production.

Copilot · 2026-02-18T13:05:03Z

src/Processors/QueryPlan/Optimizations/optimizeLazyMaterialization.cpp

+        // std::cerr << " req columns (" << required_columns.size() << ") [";
+        // for (auto && required_column : required_columns)
+        //     std::cerr << required_column << " ";
+        // std::cerr << "]\n";


Remove commented-out debugging code before merging to production.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

cursor · 2026-03-05T11:09:59Z

src/Processors/QueryPlan/Optimizations/optimizeLazyMaterialization.cpp

+
+    // std::cerr << "split nodes: " << split_nodes.size() << std::endl;
+    // for (const auto * node : split_nodes)
+    //     std::cerr << "  " << node->result_name << std::endl;


Commented-out debug statements left in production code

Low Severity

Multiple new commented-out std::cerr debug statements were added across splitExpressionStep, the forward pass loop, and the reading step modification block. While existing commented debug code was already present in this file, these new instances (lines 267–269, 467–470, 528–529, 537–540) appear to be leftover development/debugging aids that weren't cleaned up before the PR.

Additional Locations (2)

src/Processors/QueryPlan/Optimizations/optimizeLazyMaterialization.cpp#L466-L470

src/Processors/QueryPlan/Optimizations/optimizeLazyMaterialization.cpp#L527-L529

clickhouse-gh · 2026-03-23T14:49:32Z

src/Processors/QueryPlan/Optimizations/optimizeLazyMaterialization.cpp

+/// This function transitively adds ActionsDAG::Node into the set, if all the arguments are already in set (or constants).
+/// It's useful because the main branch of lazy materialization can return more columns than actually required.
+/// As an example, for the query `select a from table prewhere b > 0 order by c limit 1`, only columns `c` is required for ORDER BY,
+/// but the column `a` is returned as well (it's need for PREWHERE).


Minor typo in the comment: it's need for PREWHERE → it's needed for PREWHERE.

clickhouse-gh · 2026-03-23T14:49:33Z

tests/queries/0_stateless/03814_lazy_materialization_unneeded_columns_bug.sql

@@ -0,0 +1,7 @@
+SELECT length(thread_ids) > 0


Typo in the test name/path: unneded should be unneeded (also update the corresponding .reference file name).

clickhouse-gh · 2026-03-23T14:54:17Z

src/Processors/QueryPlan/Optimizations/optimizeLazyMaterialization.cpp

-            required_columns = getRequiredHeaderPositions(expr_step->getExpression(), *expr_step->getInputHeaders().front() , std::move(required_columns));
+        {
+            const auto & expr = expr_step->getExpression();
+            /// The number of DAG outputs can be less then the number of columns in the header.


Typo: less then should be less than in this comment.

Please fix it to keep the code comments clear and consistent.

devcrafter · 2026-03-18T14:09:43Z

tests/queries/0_stateless/03814_lazy_materialization_unneeded_columns_bug.sql

@@ -0,0 +1,7 @@
+SELECT length(thread_ids) > 0
+FROM system.query_log
+WHERE (current_database = currentDatabase()) AND (event_date >= (today() - 1)) AND (lower(query) LIKE '%abcd%') AND (type = 'QueryFinish')


Let's execute some deterministic query. It someone will use abcd in test queries, - this one will become flaky

But we also check for the current database, which is unique.

clickhouse-gh · 2026-03-24T19:25:36Z

LLVM Coverage Report

Metric	Baseline	Current	Δ
Lines	84.10%	84.00%	-0.10%
Functions	24.50%	24.50%	+0.00%
Branches	76.60%	76.60%	+0.00%

PR changed lines: PR changed-lines coverage: 97.56% (200/205, 2 noise lines excluded)
Diff coverage report
Uncovered code

clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Feb 11, 2026

KochetovNicolai added 2 commits February 11, 2026 18:21

Properly remove unneeded columns for lazy materialization.

a88e77e

Revert "Fix block structure mismatch in lazy materialization with PRE…

e1ebc73

…WHERE and parallel replicas" This reverts commit 10f0f02.

KochetovNicolai force-pushed the fix-lm-unused_columns branch from ebeb5e2 to e1ebc73 Compare February 11, 2026 18:22

KochetovNicolai added 2 commits February 13, 2026 13:59

Fixing a test.

c8d4164

Cleanup and add comments.

18801f2

KochetovNicolai marked this pull request as ready for review February 13, 2026 15:28

Fix style.

c0853ab

devcrafter self-assigned this Feb 17, 2026

devcrafter requested a review from Copilot February 18, 2026 13:02

Copilot started reviewing on behalf of devcrafter February 18, 2026 13:03 View session

Copilot AI reviewed Feb 18, 2026

View reviewed changes

Merge branch 'master' into fix-lm-unused_columns

d7f724b

cursor bot reviewed Mar 5, 2026

View reviewed changes

devcrafter and others added 3 commits March 18, 2026 14:46

Merge branch 'master' into fix-lm-unused_columns

0480337

Merge remote-tracking branch 'origin/master' into fix-lm-unused_columns

77e2694

Cleanup

30fb4bc

clickhouse-gh bot reviewed Mar 23, 2026

View reviewed changes

Fix test name

c5cb86b

clickhouse-gh bot reviewed Mar 23, 2026

View reviewed changes

devcrafter approved these changes Mar 23, 2026

View reviewed changes

Update optimizeLazyMaterialization.cpp

7e1a9eb

Merge branch 'master' into fix-lm-unused_columns

7698acd

	/// requred_outputs are outputs of ActionsDAG, however required_inputs are inputs corresponding to the step input header.
	/// required_outputs are outputs of ActionsDAG, however required_inputs are inputs corresponding to the step input header.

	// std::cerr << "split nodes: " << split_nodes.size() << std::endl;
	// for (const auto * node : split_nodes)
	// std::cerr << " " << node->result_name << std::endl;

		// std::cerr << ".. Main header " << read_from_merge_tree->getOutputHeader()->dumpNames() << std::endl;
		// std::cerr << ".. Lazy header " << lazy_reading->getOutputHeader()->dumpNames() << std::endl;

Conversation

KochetovNicolai commented Feb 11, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Missing context

ClickHouse Rules

Final Verdict

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 5, 2026

Choose a reason for hiding this comment

Commented-out debug statements left in production code

Uh oh!

clickhouse-gh bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

devcrafter Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

KochetovNicolai Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh bot commented Mar 24, 2026

LLVM Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

KochetovNicolai commented Feb 11, 2026 •

edited by cursor bot

Loading

clickhouse-gh bot commented Feb 11, 2026 •

edited

Loading