Skip to content

Fix Block structure mismatch with normal projections and parallel replicas#99497

Merged
alexey-milovidov merged 22 commits intomasterfrom
fix-projection-parallel-replicas-mismatch
Apr 10, 2026
Merged

Fix Block structure mismatch with normal projections and parallel replicas#99497
alexey-milovidov merged 22 commits intomasterfrom
fix-projection-parallel-replicas-mismatch

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov commented Mar 14, 2026

When optimizeUseNormalProjections replaces the read chain with a projection-based read where all parts come from the projection, the new chain may produce extra columns (e.g. from PREWHERE column restoration in splitAndFillPrewhereInfo). With parallel replicas, the local plan uses the projection while the remote plan does not, so the UnionStep combining them gets mismatched headers — the local side has extra columns the remote side doesn't.

The fix adds a converting step when the projection chain's output differs from the original chain's output. This mirrors the existing compatibility check already present for the mixed projection+parent-parts code path (line 459).

Closes #96131. Closes #95191.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=73f45b0717e3a4763b7d1cde71950a076b5b00c7&name_0=MasterCI&name_1=Stress%20test%20%28azure%2C%20amd_msan%29

Changelog category (leave one):

  • Not for changelog

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix exception "Block structure mismatch in UnionStep stream" when using normal projections with parallel replicas.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

… parallel replicas

When `optimizeUseNormalProjections` replaces the read chain with a
projection-based read, the new chain may output extra columns (e.g.,
`region` from PREWHERE restoration) that the original chain did not have.
This causes a "Block structure mismatch in UnionStep stream" exception
in debug/sanitizer builds when the local plan (with projection) is
combined with the remote plan (without projection) in parallel replicas.

The fix adds a converting step when the projection chain's output header
differs from the expected output, ensuring headers stay consistent.
This mirrors the existing header compatibility check already present
for the mixed projection+parent-parts case (lines 459-460).

https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=73f45b0717e3a4763b7d1cde71950a076b5b00c7&name_0=MasterCI&name_1=Stress%20test%20%28azure%2C%20amd_msan%29

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Mar 14, 2026

Workflow [PR], commit [eb9f32e]

Summary:


AI Review

Summary

This PR adds a new stateless regression test for the UnionStep header mismatch around normal projections with parallel replicas. The test setup and data generation are clear, but the key assertion currently verifies projection usage (region_proj) rather than the specific header-conversion behavior this fix intends to protect, so it may not reliably catch regressions of that exact bug.

Findings

⚠️ Majors

  • [tests/queries/0_stateless/04039_normal_projection_parallel_replicas.sql:55-60] The regression check is not specific to the conversion path that prevents UnionStep header mismatch. It asserts that a projection is used, which can still be true even when conversion-specific behavior regresses.
    • Suggested fix: add a deterministic assertion tied to the conversion path (for example via EXPLAIN PIPELINE/plan-shape evidence of the conversion stage or an assertion that fails on code without the conversion fix).
Tests
  • ⚠️ Add a regression assertion that is specific to the header-alignment conversion, not only projection selection.
ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
No large/binary files
Final Verdict
  • Status: ⚠️ Request changes
  • Minimum required action: make the new regression check validate the conversion/header-alignment behavior directly, not only projection selection.

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Mar 14, 2026
@devcrafter devcrafter self-assigned this Mar 15, 2026
@devcrafter
Copy link
Copy Markdown
Member

devcrafter commented Mar 17, 2026

Here is the reason we can have unused columns after PREWHERE in header -

/// This is the leak of abstraction.
/// Split actions may have inputs which are needed only for PREWHERE.
/// This is fine for ActionsDAG to have such a split, but it breaks defaults calculation.
///
/// See 00950_default_prewhere for example.
/// Table has structure `APIKey UInt8, SessionType UInt8` and default `OperatingSystem = SessionType+1`
/// For a query with `SELECT OperatingSystem WHERE APIKey = 42 AND SessionType = 42` we push everything to PREWHERE
/// and columns APIKey, SessionType are removed from inputs (cause only OperatingSystem is needed).
/// However, column OperatingSystem is calculated after PREWHERE stage, based on SessionType value.
/// If column SessionType is removed by PREWHERE actions, we use zero as default, and get a wrong result.
///
/// So, here we restore removed inputs for PREWHERE actions
{
std::unordered_set<const ActionsDAG::Node *> first_outputs(
split_result.first.getOutputs().begin(), split_result.first.getOutputs().end());
for (const auto * input : split_result.first.getInputs())
{
if (!first_outputs.contains(input))
{
split_result.first.getOutputs().push_back(input);
/// Add column to second actions as input.
/// Do not add it to result, so it would be removed.
split_result.second.addInput(input->result_name, input->result_type);
}
}
}

It fixes another issue. So, it seems it's a job for query_plan_remove_unused_columns optimization, but currently it's not happening

Copy link
Copy Markdown
Member

@devcrafter devcrafter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more symptom fix. Proper fix will take some effort - some thoughts here

alexey-milovidov and others added 10 commits March 24, 2026 00:32
The previous test had too few rows (2) to trigger `optimizePrewhere`,
which is needed to produce the extra PREWHERE restoration columns that
cause the Block structure mismatch. With 10000 rows, column sizes are
meaningful and the `MergeTreeWhereOptimizer` moves the filter to
PREWHERE. The PREWHERE column restoration then adds `region` as an
extra output that persists through `ActionsDAG::mergeInplace` into the
projection query DAG, causing the header mismatch with the remote
replica plan.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The bugfix validation was failing because commit 0f305e6 (already
in master) moved the debug assertion in `UnionStep::updatePipeline`
after the header conversion code. This means the pipeline-level safety
net silently handles mismatched headers even in debug builds, so the
test could never reproduce the runtime error against the master binary.

Add an EXPLAIN PLAN check that detects the plan-level fix: the
converting `ExpressionStep` added by `optimizeUseNormalProjection`
creates two consecutive Expression steps in the plan (converting +
query processing). Without the fix, only the query processing
Expression exists. This structural difference is observable regardless
of build type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use `groupArray` + `arrayExists` with a lambda to detect consecutive
Expression steps, instead of a self-join on two separate EXPLAIN
queries. This is both simpler and avoids potential non-determinism
from running two independent EXPLAIN executions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The EXPLAIN-based plan check for consecutive Expression steps was
unreliable: it returned the same result on both the master binary
(without the plan-level fix) and the patched binary because other
plan optimizations also create consecutive Expression steps.

The actual runtime fix for the `Block structure mismatch in UnionStep`
exception was merged via PR #99515, which moved the debug assertion
in `UnionStep::updatePipeline` after the header conversion code.
This PR's plan-level fix in `optimizeUseNormalProjections` is a
defense-in-depth measure that prevents the header mismatch from
reaching the pipeline build stage.

Keep only the query correctness check as a regression test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The `optimizeLazyMaterialization2` optimization adds its own projection
step that strips extra PREWHERE columns, masking the header mismatch bug
in `optimizeUseNormalProjections`. Disable it so the bugfix validation
can distinguish between the fixed and unfixed code paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Mar 30, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.10% +0.00%
Functions 90.90% 90.90% +0.00%
Branches 76.70% 76.70% +0.00%

Changed lines: 100.00% (18/18) · Uncovered code

Full report · Diff report

alexey-milovidov and others added 7 commits March 30, 2026 16:38
The bugfix validation test was failing because the query-success check
alone cannot distinguish between master and this PR — PR #99515 already
handles the header mismatch at pipeline level, so the query succeeds on
both.

Add a `setStepDescription` call to the converting `ExpressionStep` in
`optimizeUseNormalProjections` so it appears as "Convert projection
output to match expected header" in EXPLAIN output. The test now checks
for this specific step description, which is absent on master and
present with this fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The EXPLAIN check for "Convert projection output to match expected header"
returns 0 on current master because the projection and original reading
output headers now match (PREWHERE column handling was improved upstream).
The converting step in `optimizeUseNormalProjections` is defensive code
that correctly handles header mismatches when they occur, but the test
scenario no longer triggers it.

Replace the assertion with a check that the `region_proj` projection is
actually used in the plan, which validates that the projection optimization
runs correctly with parallel replicas.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The converting step in `optimizeUseNormalProjections` is no longer needed:
the header mismatch is handled at the pipeline level by #99515, and
recent master changes to PREWHERE column handling eliminated the
mismatch at the plan level too.

Keep the test as a regression test for normal projections with parallel
replicas.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@alexey-milovidov alexey-milovidov added pr-not-for-changelog This PR should not be mentioned in the changelog and removed pr-bugfix Pull request with bugfix, not backported by default labels Apr 8, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@alexey-milovidov
Copy link
Copy Markdown
Member Author

Now test-only change.

@alexey-milovidov alexey-milovidov added this pull request to the merge queue Apr 10, 2026
Merged via the queue into master with commit d085036 Apr 10, 2026
321 of 322 checks passed
@alexey-milovidov alexey-milovidov deleted the fix-projection-parallel-replicas-mismatch branch April 10, 2026 04:38
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

3 participants