Fix column count assertion in HashJoin::getNonJoinedBlocks#101066
Fix column count assertion in HashJoin::getNonJoinedBlocks#101066groeneai wants to merge 2 commits intoClickHouse:masterfrom
Conversation
Pre-PR Validation Gate (Session: cron:clickhouse-ci-task-worker:20260329-091500)a) Deterministic repro? Yes — the AST fuzzer queries from PR #100617's test (GLOBAL NOT IN with join side swap) trigger the LOGICAL_ERROR assertion deterministically in debug builds without the fix. b) Root cause explained? c) Fix matches root cause? Yes — the fix mirrors the exact d) Test intent preserved? Yes — the assertion still checks exact column count equality. No bounds widened, no tags added, no assertions removed. The only change is correct accounting for cross-side overlap. e) Both directions demonstrated? Test passes 20/20 with fix. All 6 locally-runnable join-duplicate tests (00577, 03707, 02497, 02267, 02381, 00370) that PR #100617 breaks pass 60/60 with this fix. |
|
cc @vdimir @KochetovNicolai — could you review this? It fixes the P0 recurring STID 2980-385f crash in |
|
Workflow [PR], commit [1160719] Summary: ❌
AI ReviewSummary ClickHouse Rules
Final Verdict
|
| <= toInt64(materialize(9223372036854775807)) | ||
| FORMAT Null | ||
| SETTINGS enable_analyzer = 1 | ||
| " 2>&1 | grep -v -c 'LOGICAL_ERROR' > /dev/null |
There was a problem hiding this comment.
This check is not reliable for detecting LOGICAL_ERROR.
grep -v -c 'LOGICAL_ERROR' > /dev/null succeeds whenever there is any non-matching line, so a real LOGICAL_ERROR can be masked by other stderr lines (for example Received exception from server/query context lines).
Please make this assertion explicit, e.g.:
... 2>&1 | grep -q 'LOGICAL_ERROR' && exit 1
# or
... 2>&1 | (! grep -q 'LOGICAL_ERROR')The same pattern appears again below in this file.
After join side swapping (e.g., LEFT ANTI → RIGHT ANTI via GLOBAL NOT IN), column names can appear in both the left block and sample_block_with_columns_to_add. AddedColumns correctly skips right-side columns whose renamed name already exists in the left block, but the assertion in getNonJoinedBlocks computed the expected column count as a naive sum (left + right_keys + columns_to_add) without accounting for this cross-side overlap. The fix counts right-side columns whose renamed name appears in the left block (matching the exact deduplication logic in AddedColumns) and subtracts this overlap from the expected count. Unlike the NameSet-based approach in PR ClickHouse#100617, this preserves within-left duplicate column names which are legitimate in queries with subqueries. Fixes STID 2980-385f — P0 recurring AST fuzzer crash hitting 20+ PRs and master over the last 12 days.
- Replace NameSet::count() with NameSet::contains() for membership checks
(clang-tidy readability-container-contains diagnostic)
- Fix test assertion: use `{ ! grep -q 'LOGICAL_ERROR'; }` instead of
`grep -v -c 'LOGICAL_ERROR' > /dev/null` which could mask LOGICAL_ERROR
presence when other stderr lines exist
5b6ba62 to
1160719
Compare
LLVM Coverage Report
Changed lines: 90.00% (27/30) · Uncovered code |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix
Logical error: Unexpected number of columns in result sample blockcrash inHashJoin::getNonJoinedBlockswhen column names overlap between left and right sides after join side swapping (e.g., withGLOBAL NOT IN).Description
After join side swapping (e.g., LEFT ANTI → RIGHT ANTI via
GLOBAL NOT IN), column names can appear in both the left block andsample_block_with_columns_to_add. DuringjoinBlock,AddedColumnscorrectly skips right-side columns whose renamed name already exists in the left block (seeAddedColumnsconstructor, line 154 ofAddedColumns.h). However, the assertion ingetNonJoinedBlockscomputed the expected column count as a naive sum (left + right_keys + columns_to_add) without accounting for this cross-side overlap, causing aLOGICAL_ERRORcrash.Root cause: The assertion at line 1711 of
HashJoin.cppassumedresult_sample_block.columns() == left_columns_count + required_right_keys.columns() + sample_block_with_columns_to_add.columns(), but when column names overlap across sides, the result block has fewer columns becauseAddedColumnsdeduplicates them.Fix: Count right-side columns (from both
sample_block_with_columns_to_addandrequired_right_keys) whose renamed name appears in the left block, matching the exact deduplication logic inAddedColumns, and subtract this overlap from the expected count.Why not PR #100617's approach: That PR uses
NameSetcomparison which also deduplicates within-left duplicates. Queries with subqueries can legitimately produce duplicate column names on the same side (e.g.,__table2appearing twice in the left block). TheNameSetapproach causesexpected_names.size() < result.columns()for such queries, breaking 7 existing join tests (00577, 03707, 02497, 02267, 00217, 02381, 00370). Our cross-side overlap approach preserves within-left duplicates while correctly handling cross-side overlap.Impact: STID 2980-385f — P0 recurring AST fuzzer crash hitting 20+ unrelated PRs and master over the last 12 days.
Verification:
04068_hashjoin_column_overlap_assertion— 20/20 passes with randomization🤖 Generated with Claude Code