Fix exception "Column identifier is already registered" in planner#101048
Fix exception "Column identifier is already registered" in planner#101048alexey-milovidov wants to merge 11 commits intomasterfrom
Conversation
During AST fuzzer stress testing, `prepareBuildQueryPlanForTableExpression` could attempt to register the same column identifier twice when the same table expression was processed multiple times with an empty column list. Use `createColumnIdentifierOrGet` to avoid the duplicate registration, and guard `addColumn` with a `hasColumn` check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
src/Planner/PlannerContext.cpp
Outdated
|
|
||
| const auto & source_alias = column_source_node->getAlias(); | ||
| if (!source_alias.empty()) | ||
| column_identifier = source_alias + "." + column.name; |
There was a problem hiding this comment.
This is wrong - missing quoting.
|
Workflow [PR], commit [b12d15e] Summary: ❌
AI ReviewSummaryThis PR fixes a planner Missing context
Findings💡 Nits
ClickHouse Rules
Final Verdict
|
src/Planner/PlannerContext.cpp
Outdated
There was a problem hiding this comment.
Same here. Use backQuoteIfNeed
|
Add a test. |
Apply proper quoting with `backQuoteIfNeed` when constructing column identifiers from source alias and column name, in both `createColumnIdentifier` and `createColumnIdentifierOrGet`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The `backQuoteIfNeed` quoting applied to column identifiers in `createColumnIdentifier` and `createColumnIdentifierOrGet` breaks column lookup throughout the planner, because these identifiers are used as internal keys (not SQL identifiers). Names containing dots (e.g. `nested.key`, `u2.uid`) get backtick-quoted, causing `NOT_FOUND_COLUMN_IN_BLOCK` exceptions and EXPLAIN output mismatches. Revert to the original unquoted identifier construction, which is consistent with how the rest of the planner resolves columns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
src/Planner/PlannerContext.cpp
Outdated
|
|
||
| const auto & source_alias = column_source_node->getAlias(); | ||
| if (!source_alias.empty()) | ||
| column_identifier = source_alias + "." + column.name; |
There was a problem hiding this comment.
createColumnIdentifierOrGet builds identifiers as <alias>.<column> without quoting/escaping. This can still collide for valid quoted names containing dots (e.g. alias a.b + column c vs alias a + column b.c), so the planner may map distinct columns to the same identifier and produce incorrect behavior.
Please build the identifier with backQuoteIfNeed (same for both alias and column) or, better, factor both createColumnIdentifier and createColumnIdentifierOrGet through one helper that always formats identifiers with quoting.
The planner could throw a LOGICAL_ERROR "Column identifier is already registered" when `prepareBuildQueryPlanForTableExpression` processed a table expression multiple times with an empty selected-column list. This was found by the AST fuzzer during stress testing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| @@ -0,0 +1,10 @@ | |||
| -- Regression test: the planner could throw LOGICAL_ERROR "Column identifier is already registered" | |||
There was a problem hiding this comment.
This PR fixes planner behavior (exception Column identifier is already registered) and adds a regression test, so Changelog category should not be CI Fix or Improvement.
Please switch it to Bug Fix (or another user-facing non-CI category that matches the actual planner/runtime change).
Use `backQuoteIfNeed` for both alias and column name when building column identifiers in `createColumnIdentifier` and `createColumnIdentifierOrGet` to avoid collisions for quoted names containing dots. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The `backQuoteIfNeed` change in `createColumnIdentifier` causes EXPLAIN output to backtick-quote column identifiers containing dots, parens, or numeric names (e.g. `__table1.json.a` → `__table1.\`json.a\``). Update reference files for affected tests: - 03275_subcolumns_in_primary_key - 03302_analyzer_distributed_filter_push_down - 03375_l2_distance_transposed_partial_reads_pass - 03405_merge_filter_into_join - 03445_subcolumns_prewhere_pushdown - 03786_outer_to_semi_or_anti Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract the shared identifier-building logic from `createColumnIdentifier` and `createColumnIdentifierOrGet` into a static `buildColumnIdentifier` helper, as suggested in review. Fix the `03375_l2_distance_transposed_partial_reads_pass` reference file where the EXPLAIN output truncation point for `cosineDistanceTransposed` was off by 8 characters (due to backtick-quoted column names making the line longer). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…istance_transposed_partial_reads_pass\` The EXPLAIN output truncation point for the \`L2DistanceTransposed\` function result name was off — the backtick-quoted column identifiers make the line longer, so the fixed-width truncation cuts deeper into the result name. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eIfNeed` in column identifiers The `backQuoteIfNeed` change in `buildColumnIdentifier` causes EXPLAIN output to backtick-quote column identifiers containing dots (e.g. `__table1.json.a` → `__table1.\`json.a\``). This test was missed in the earlier reference update. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…te-column-identifier
LLVM Coverage Report
Changed lines: 100.00% (28/28) · Uncovered code |
During AST fuzzer stress testing,
prepareBuildQueryPlanForTableExpressioncould attempt to register the same column identifier twice when the same table expression was processed multiple times with an empty column list. UsecreateColumnIdentifierOrGetto avoid the duplicate registration, and guardaddColumnwith ahasColumncheck.Changelog category (leave one):