Fix column count mismatch in HashJoin::getNonJoinedBlocks after join side swap#100617
Fix column count mismatch in HashJoin::getNonJoinedBlocks after join side swap#100617alexey-milovidov wants to merge 4 commits intomasterfrom
Conversation
…n side swap After join side swapping (e.g., LEFT ANTI → RIGHT ANTI for GLOBAL NOT IN), a column name can appear in both the left block and `sample_block_with_columns_to_add` or `required_right_keys`. There can also be duplicates within the left block itself. The `result_sample_block` deduplicates such columns, so the previous check that compared raw column counts (left + right_keys + right_add) against `result_sample_block.columns()` was too strict and triggered a LOGICAL_ERROR exception. Replace the raw-sum check with a unique-name-set comparison that correctly handles all deduplication cases. Closes #100422 https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100378&sha=ce356689f6a6d126e078f5cbfcbab8b3849de673&name_0=PR&name_1=AST%20fuzzer%20%28amd_tsan%29 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Workflow [PR], commit [79cad63] Summary: ❌
AI ReviewSummaryThis PR fixes a real Findings
Tests
ClickHouse Rules
Final Verdict
|
|
Hi @alexey-milovidov, the fix direction is correct but the NameSet approach over-deduplicates. Here's why the 8 tests fail and a suggested correction. Why the 8 Tests FailThe 8 failing tests ( SELECT 'y', * FROM (SELECT count('y'), count('y'), 2 AS x) AS t1
RIGHT JOIN (SELECT count('x'), count('y'), 3 AS x) AS t2 ON t1.x = t2.x;Here The Root Difference
Suggested Fix: Cross-Side Overlap OnlyInstead of using NameSet for everything, subtract only the cross-side overlap (names that appear in both the left side and the right side / keys): size_t left_columns_count = left_sample_block.columns();
if (canRemoveColumnsFromLeftBlock())
left_columns_count = table_join->getOutputColumns(JoinTableSide::Left).size();
// Build unique name set for left side only
NameSet left_names;
if (canRemoveColumnsFromLeftBlock())
for (const auto & col : table_join->getOutputColumns(JoinTableSide::Left))
left_names.insert(col.name);
else
for (const auto & name : left_sample_block.getNames())
left_names.insert(name);
// Count names that appear in BOTH left and (right_keys ∪ to_add)
size_t cross_side_overlap = 0;
for (size_t i = 0; i < required_right_keys.columns(); ++i)
if (left_names.count(required_right_keys.getByPosition(i).name))
++cross_side_overlap;
for (size_t i = 0; i < sample_block_with_columns_to_add.columns(); ++i)
if (left_names.count(sample_block_with_columns_to_add.getByPosition(i).name))
++cross_side_overlap;
size_t expected_columns_count = left_columns_count
+ required_right_keys.columns()
+ sample_block_with_columns_to_add.columns()
- cross_side_overlap;This:
|
…rlap-getNonJoinedBlocks
The previous fix compared `expected_names.size()` (unique) against `result_sample_block.columns()` (raw count including duplicates). When duplicate column names exist in the result block (e.g., after join side swap), `result_sample_block` also has duplicates, so the comparison was still incorrect. Compare unique name sets on both sides instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| <= toInt64(materialize(9223372036854775807)) | ||
| FORMAT Null | ||
| SETTINGS enable_analyzer = 1 | ||
| " >/dev/null 2>&1 ||: |
There was a problem hiding this comment.
The regression check is currently ineffective: both repro queries end with >/dev/null 2>&1 ||:, so the test passes even when HashJoin::getNonJoinedBlocks throws the same LOGICAL_ERROR this PR is fixing.
Could we assert that this specific error is absent instead of ignoring all failures? For example, capture stderr/stdout and fail if it contains Unexpected number of columns in result sample block / LOGICAL_ERROR.
After join side swapping (e.g., LEFT ANTI → RIGHT ANTI via GLOBAL NOT IN), column names can appear in both the left block and sample_block_with_columns_to_add. AddedColumns correctly skips right-side columns whose renamed name already exists in the left block, but the assertion in getNonJoinedBlocks computed the expected column count as a naive sum (left + right_keys + columns_to_add) without accounting for this cross-side overlap. The fix counts right-side columns whose renamed name appears in the left block (matching the exact deduplication logic in AddedColumns) and subtracts this overlap from the expected count. Unlike the NameSet-based approach in PR ClickHouse#100617, this preserves within-left duplicate column names which are legitimate in queries with subqueries. Fixes STID 2980-385f — P0 recurring AST fuzzer crash hitting 20+ PRs and master over the last 12 days.
After join side swapping (e.g., LEFT ANTI → RIGHT ANTI via GLOBAL NOT IN), column names can appear in both the left block and sample_block_with_columns_to_add. AddedColumns correctly skips right-side columns whose renamed name already exists in the left block, but the assertion in getNonJoinedBlocks computed the expected column count as a naive sum (left + right_keys + columns_to_add) without accounting for this cross-side overlap. The fix counts right-side columns whose renamed name appears in the left block (matching the exact deduplication logic in AddedColumns) and subtracts this overlap from the expected count. Unlike the NameSet-based approach in PR ClickHouse#100617, this preserves within-left duplicate column names which are legitimate in queries with subqueries. Fixes STID 2980-385f — P0 recurring AST fuzzer crash hitting 20+ PRs and master over the last 12 days.
| @@ -1721,7 +1741,7 @@ HashJoin::getNonJoinedBlocks(const Block & left_sample_block, const Block & resu | |||
|
|
|||
| throw Exception(ErrorCodes::LOGICAL_ERROR, | |||
| "Unexpected number of columns in result sample block: {} expected {} ([{}] = [{}] + [{}] + [{}])", | |||
There was a problem hiding this comment.
The check above now compares name sets, not raw column counts, but this exception still says Unexpected number of columns and prints a pseudo-sum ([left] + [right_keys] + [columns_to_add]) that is no longer mathematically meaningful after deduplication.
Could we reword it to reflect set mismatch (for example, Unexpected columns in result sample block) and print the expected name set directly? That will make future debugging much less confusing.
After join side swapping (e.g., LEFT ANTI → RIGHT ANTI for
GLOBAL NOT IN), a column name can appear in both the left block andsample_block_with_columns_to_addorrequired_right_keys. There can also be duplicates within the left block itself. Theresult_sample_blockdeduplicates such columns, so the previous check that compared raw column counts was too strict and triggered a LOGICAL_ERROR exception.Replace the raw-sum check with a unique-name-set comparison that correctly handles all deduplication cases.
Closes #100422
CI reports:
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix LOGICAL_ERROR exception "Unexpected number of columns in result sample block" in
HashJoin::getNonJoinedBlocksthat could occur withGLOBAL IN/GLOBAL NOT INqueries when the join optimizer swaps sides.Documentation entry for user-facing changes
🤖 Generated with Claude Code