Fix column count mismatch in HashJoin::getNonJoinedBlocks after join side swap by alexey-milovidov · Pull Request #100617 · ClickHouse/ClickHouse

alexey-milovidov · 2026-03-24T15:09:38Z

After join side swapping (e.g., LEFT ANTI → RIGHT ANTI for GLOBAL NOT IN), a column name can appear in both the left block and sample_block_with_columns_to_add or required_right_keys. There can also be duplicates within the left block itself. The result_sample_block deduplicates such columns, so the previous check that compared raw column counts was too strict and triggered a LOGICAL_ERROR exception.

Replace the raw-sum check with a unique-name-set comparison that correctly handles all deduplication cases.

Closes #100422

CI reports:

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100378&sha=ce356689f6a6d126e078f5cbfcbab8b3849de673&name_0=PR&name_1=AST%20fuzzer%20%28amd_tsan%29

Changelog category (leave one):

Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix LOGICAL_ERROR exception "Unexpected number of columns in result sample block" in HashJoin::getNonJoinedBlocks that could occur with GLOBAL IN / GLOBAL NOT IN queries when the join optimizer swaps sides.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

🤖 Generated with Claude Code

…n side swap After join side swapping (e.g., LEFT ANTI → RIGHT ANTI for GLOBAL NOT IN), a column name can appear in both the left block and `sample_block_with_columns_to_add` or `required_right_keys`. There can also be duplicates within the left block itself. The `result_sample_block` deduplicates such columns, so the previous check that compared raw column counts (left + right_keys + right_add) against `result_sample_block.columns()` was too strict and triggered a LOGICAL_ERROR exception. Replace the raw-sum check with a unique-name-set comparison that correctly handles all deduplication cases. Closes #100422 https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100378&sha=ce356689f6a6d126e078f5cbfcbab8b3849de673&name_0=PR&name_1=AST%20fuzzer%20%28amd_tsan%29 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

clickhouse-gh · 2026-03-24T15:10:20Z

Workflow [PR], commit [79cad63]

Summary: ❌

job_name	test_name	status	info
Fast test		failure
	Server died	FAIL	cidb
	clickhouse-test	FAIL	cidb
Build (amd_debug)		dropped
Build (amd_asan_ubsan)		dropped
Build (amd_tsan)		dropped
Build (amd_msan)		dropped
Build (amd_binary)		dropped
Build (arm_debug)		dropped
Build (arm_asan_ubsan)		dropped
Build (arm_tsan)		dropped
Build (arm_msan)		dropped

AI Review

Summary

This PR fixes a real LOGICAL_ERROR path in HashJoin::getNonJoinedBlocks by replacing a raw column-count invariant with a unique-name-set comparison, which matches actual deduplication behavior after join-side swap. The core logic change looks correct and targeted, and the PR includes a dedicated regression test. I found one major issue in test robustness (already noted by an existing inline comment) and one additional major issue in exception-message quality.

Findings

⚠️ Majors

[src/Interpreters/HashJoin/HashJoin.cpp:1743] The invariant now validates set equality, but the thrown error still says Unexpected number of columns and reports an additive decomposition ([left] + [right] + [to_add]) that is not semantically correct after deduplication. This makes debugging future mismatches unnecessarily confusing.
- Suggested fix: reword the message to describe column-set mismatch and include expected/actual names (or missing/extra names) instead of count arithmetic.

Tests

⚠️ tests/queries/0_stateless/04057_global_not_in_join_swap_column_overlap.sh currently suppresses all failures for both repro queries via >/dev/null 2>&1 ||:. This means the test can pass even if the original LOGICAL_ERROR regresses. Add an assertion that the specific error text is absent (or fail on non-zero exit except for explicitly allowed errors).

ClickHouse Rules

Item	Status	Notes
Deletion logging	➖
Serialization versioning	➖
Core-area scrutiny	✅
No test removal	✅
Experimental gate	➖
No magic constants	✅
Backward compatibility	✅
`SettingsChangesHistory.cpp`	➖
PR metadata quality	✅
Safe rollout	✅
Compilation time	✅

Final Verdict

Status: ⚠️ Request changes
Minimum required actions:
- Make the new regression test assertive (do not ignore all failures with ||:).
- Update the HashJoin::getNonJoinedBlocks exception text to match set-based validation semantics.

groeneai · 2026-03-25T17:33:17Z

Hi @alexey-milovidov, the fix direction is correct but the NameSet approach over-deduplicates. Here's why the 8 tests fail and a suggested correction.

Why the 8 Tests Fail

The 8 failing tests (00370_duplicate_columns_in_subqueries, 02497_storage_join_right_assert, etc.) contain queries with within-side duplicate column names — e.g. the same subquery column appearing twice on the left side:

SELECT 'y', * FROM (SELECT count('y'), count('y'), 2 AS x) AS t1 
RIGHT JOIN (SELECT count('x'), count('y'), 3 AS x) AS t2 ON t1.x = t2.x;

Here count('y') appears twice in t1 (left side). The result_sample_block legitimately has both columns (count: ≥5). But NameSet deduplicates count('y') to a single entry → expected_names.size() < result_sample_block.columns() → assertion fires incorrectly.

The Root Difference

STID 2980-385f case: expected > actual — same name in left AND right → result deduplicates cross-side → expected overcounts
Failing tests: expected < actual — same name twice within left → NameSet undercounts

Suggested Fix: Cross-Side Overlap Only

Instead of using NameSet for everything, subtract only the cross-side overlap (names that appear in both the left side and the right side / keys):

size_t left_columns_count = left_sample_block.columns();
if (canRemoveColumnsFromLeftBlock())
    left_columns_count = table_join->getOutputColumns(JoinTableSide::Left).size();

// Build unique name set for left side only
NameSet left_names;
if (canRemoveColumnsFromLeftBlock())
    for (const auto & col : table_join->getOutputColumns(JoinTableSide::Left))
        left_names.insert(col.name);
else
    for (const auto & name : left_sample_block.getNames())
        left_names.insert(name);

// Count names that appear in BOTH left and (right_keys ∪ to_add)
size_t cross_side_overlap = 0;
for (size_t i = 0; i < required_right_keys.columns(); ++i)
    if (left_names.count(required_right_keys.getByPosition(i).name))
        ++cross_side_overlap;
for (size_t i = 0; i < sample_block_with_columns_to_add.columns(); ++i)
    if (left_names.count(sample_block_with_columns_to_add.getByPosition(i).name))
        ++cross_side_overlap;

size_t expected_columns_count = left_columns_count
    + required_right_keys.columns()
    + sample_block_with_columns_to_add.columns()
    - cross_side_overlap;

This:

Preserves within-left duplicates in the count → no false positive for {count('y'), count('y'), x}
Correctly subtracts when the same name appears in both left and right → fixes STID 2980-385f
Minimal change from the original logic — just adds the overlap subtraction

…rlap-getNonJoinedBlocks

The previous fix compared `expected_names.size()` (unique) against `result_sample_block.columns()` (raw count including duplicates). When duplicate column names exist in the result block (e.g., after join side swap), `result_sample_block` also has duplicates, so the comparison was still incorrect. Compare unique name sets on both sides instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

clickhouse-gh · 2026-03-28T16:49:23Z

tests/queries/0_stateless/04057_global_not_in_join_swap_column_overlap.sh

+    <= toInt64(materialize(9223372036854775807))
+FORMAT Null
+SETTINGS enable_analyzer = 1
+" >/dev/null 2>&1 ||:


The regression check is currently ineffective: both repro queries end with >/dev/null 2>&1 ||:, so the test passes even when HashJoin::getNonJoinedBlocks throws the same LOGICAL_ERROR this PR is fixing.

Could we assert that this specific error is absent instead of ignoring all failures? For example, capture stderr/stdout and fail if it contains Unexpected number of columns in result sample block / LOGICAL_ERROR.

After join side swapping (e.g., LEFT ANTI → RIGHT ANTI via GLOBAL NOT IN), column names can appear in both the left block and sample_block_with_columns_to_add. AddedColumns correctly skips right-side columns whose renamed name already exists in the left block, but the assertion in getNonJoinedBlocks computed the expected column count as a naive sum (left + right_keys + columns_to_add) without accounting for this cross-side overlap. The fix counts right-side columns whose renamed name appears in the left block (matching the exact deduplication logic in AddedColumns) and subtracts this overlap from the expected count. Unlike the NameSet-based approach in PR ClickHouse#100617, this preserves within-left duplicate column names which are legitimate in queries with subqueries. Fixes STID 2980-385f — P0 recurring AST fuzzer crash hitting 20+ PRs and master over the last 12 days.

clickhouse-gh · 2026-03-31T10:47:47Z

src/Interpreters/HashJoin/HashJoin.cpp

@@ -1721,7 +1741,7 @@ HashJoin::getNonJoinedBlocks(const Block & left_sample_block, const Block & resu

            throw Exception(ErrorCodes::LOGICAL_ERROR,
                            "Unexpected number of columns in result sample block: {} expected {} ([{}] = [{}] + [{}] + [{}])",


The check above now compares name sets, not raw column counts, but this exception still says Unexpected number of columns and prints a pseudo-sum ([left] + [right_keys] + [columns_to_add]) that is no longer mathematically meaningful after deduplication.

Could we reword it to reflect set mismatch (for example, Unexpected columns in result sample block) and print the expected name set directly? That will make future debugging much less confusing.

clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Mar 24, 2026

alexey-milovidov and others added 2 commits March 28, 2026 16:40

Merge remote-tracking branch 'origin/master' into fix-join-column-ove…

965ad63

…rlap-getNonJoinedBlocks

clickhouse-gh bot reviewed Mar 28, 2026

View reviewed changes

groeneai mentioned this pull request Mar 29, 2026

Fix column count assertion in HashJoin::getNonJoinedBlocks #101066

Open

Merge branch 'master' into fix-join-column-overlap-getNonJoinedBlocks

79cad63

clickhouse-gh bot reviewed Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix column count mismatch in HashJoin::getNonJoinedBlocks after join side swap#100617

Fix column count mismatch in HashJoin::getNonJoinedBlocks after join side swap#100617
alexey-milovidov wants to merge 4 commits intomasterfrom
fix-join-column-overlap-getNonJoinedBlocks

alexey-milovidov commented Mar 24, 2026

Uh oh!

clickhouse-gh bot commented Mar 24, 2026 •

edited

Loading

Uh oh!

groeneai commented Mar 25, 2026

Uh oh!

clickhouse-gh bot Mar 28, 2026

Uh oh!

clickhouse-gh bot Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -1721,7 +1741,7 @@ HashJoin::getNonJoinedBlocks(const Block & left_sample_block, const Block & resu

		throw Exception(ErrorCodes::LOGICAL_ERROR,
		"Unexpected number of columns in result sample block: {} expected {} ([{}] = [{}] + [{}] + [{}])",

Conversation

alexey-milovidov commented Mar 24, 2026

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Findings

Tests

ClickHouse Rules

Final Verdict

Uh oh!

groeneai commented Mar 25, 2026

Why the 8 Tests Fail

The Root Difference

Suggested Fix: Cross-Side Overlap Only

Uh oh!

clickhouse-gh bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clickhouse-gh bot commented Mar 24, 2026 •

edited

Loading