Skip to content

Fix exception "Column identifier is already registered" in planner#101048

Open
alexey-milovidov wants to merge 11 commits intomasterfrom
fix-planner-duplicate-column-identifier
Open

Fix exception "Column identifier is already registered" in planner#101048
alexey-milovidov wants to merge 11 commits intomasterfrom
fix-planner-duplicate-column-identifier

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

During AST fuzzer stress testing, prepareBuildQueryPlanForTableExpression could attempt to register the same column identifier twice when the same table expression was processed multiple times with an empty column list. Use createColumnIdentifierOrGet to avoid the duplicate registration, and guard addColumn with a hasColumn check.

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

During AST fuzzer stress testing, `prepareBuildQueryPlanForTableExpression`
could attempt to register the same column identifier twice when the same
table expression was processed multiple times with an empty column list.
Use `createColumnIdentifierOrGet` to avoid the duplicate registration,
and guard `addColumn` with a `hasColumn` check.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

const auto & source_alias = column_source_node->getAlias();
if (!source_alias.empty())
column_identifier = source_alias + "." + column.name;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong - missing quoting.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 28, 2026

Workflow [PR], commit [b12d15e]

Summary:

job_name test_name status info comment
Integration tests (amd_llvm_coverage, 2/5) failure
test_overcommit_tracker/test.py::test_user_overcommit FAIL cidb
Stress test (arm_msan) failure
MemorySanitizer: use-of-uninitialized-value (STID: 1003-358c) FAIL cidb, issue
Integration tests (amd_asan_ubsan, db disk, old analyzer, 1/6) error
Integration tests (amd_asan_ubsan, db disk, old analyzer, 3/6) error
Integration tests (amd_asan_ubsan, db disk, old analyzer, 5/6) error
Integration tests (amd_asan_ubsan, db disk, old analyzer, 6/6) error
Integration tests (amd_binary, 2/5) error
Integration tests (amd_binary, 3/5) error
Integration tests (amd_binary, 4/5) error
Integration tests (amd_tsan, 1/6) error

AI Review

Summary

This PR fixes a planner LOGICAL_ERROR (Column identifier is already registered) by avoiding duplicate identifier registration for repeated table-expression processing and adds a regression test. The core fix in PlannerJoinTree/PlannerContext looks correct and covered by a focused stateless test. I found one process-level issue: PR metadata uses an incorrect changelog category for a real bug fix.

Missing context
  • ⚠️ No CI run results/logs were provided in this review context, so I could not validate runtime signals beyond static inspection and test changes.
Findings

💡 Nits

  • [PR description] Changelog category is currently CI Fix or Improvement, but this PR fixes a user-visible planner exception and adds a regression test; category should be Bug Fix.
    • Suggested replacement:
      • ### Changelog category (leave one):
      • - Bug Fix (user-visible misbehavior in planner; changelog entry required)
    • A similar inline comment is already posted by clickhouse-gh[bot], so no duplicate inline comment was posted.
ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality ⚠️ Changelog category does not match the actual change type (Bug Fix).
Safe rollout
Compilation time
Final Verdict
  • Status: ⚠️ Request changes
  • Minimum required actions:
    • Update PR template metadata to use the correct Changelog category (Bug Fix) and provide the required user-facing changelog entry.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Use backQuoteIfNeed

@clickhouse-gh clickhouse-gh bot added the pr-ci label Mar 28, 2026
@alexey-milovidov
Copy link
Copy Markdown
Member Author

Add a test.

alexey-milovidov and others added 2 commits March 29, 2026 01:13
Apply proper quoting with `backQuoteIfNeed` when constructing column
identifiers from source alias and column name, in both
`createColumnIdentifier` and `createColumnIdentifierOrGet`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The `backQuoteIfNeed` quoting applied to column identifiers in
`createColumnIdentifier` and `createColumnIdentifierOrGet` breaks
column lookup throughout the planner, because these identifiers are
used as internal keys (not SQL identifiers). Names containing dots
(e.g. `nested.key`, `u2.uid`) get backtick-quoted, causing
`NOT_FOUND_COLUMN_IN_BLOCK` exceptions and EXPLAIN output mismatches.

Revert to the original unquoted identifier construction, which is
consistent with how the rest of the planner resolves columns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

const auto & source_alias = column_source_node->getAlias();
if (!source_alias.empty())
column_identifier = source_alias + "." + column.name;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createColumnIdentifierOrGet builds identifiers as <alias>.<column> without quoting/escaping. This can still collide for valid quoted names containing dots (e.g. alias a.b + column c vs alias a + column b.c), so the planner may map distinct columns to the same identifier and produce incorrect behavior.

Please build the identifier with backQuoteIfNeed (same for both alias and column) or, better, factor both createColumnIdentifier and createColumnIdentifierOrGet through one helper that always formats identifiers with quoting.

The planner could throw a LOGICAL_ERROR "Column identifier is already
registered" when `prepareBuildQueryPlanForTableExpression` processed a
table expression multiple times with an empty selected-column list.
This was found by the AST fuzzer during stress testing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@@ -0,0 +1,10 @@
-- Regression test: the planner could throw LOGICAL_ERROR "Column identifier is already registered"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR fixes planner behavior (exception Column identifier is already registered) and adds a regression test, so Changelog category should not be CI Fix or Improvement.

Please switch it to Bug Fix (or another user-facing non-CI category that matches the actual planner/runtime change).

alexey-milovidov and others added 6 commits March 29, 2026 05:57
Use `backQuoteIfNeed` for both alias and column name when building
column identifiers in `createColumnIdentifier` and `createColumnIdentifierOrGet`
to avoid collisions for quoted names containing dots.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The `backQuoteIfNeed` change in `createColumnIdentifier` causes EXPLAIN
output to backtick-quote column identifiers containing dots, parens, or
numeric names (e.g. `__table1.json.a` → `__table1.\`json.a\``).

Update reference files for affected tests:
- 03275_subcolumns_in_primary_key
- 03302_analyzer_distributed_filter_push_down
- 03375_l2_distance_transposed_partial_reads_pass
- 03405_merge_filter_into_join
- 03445_subcolumns_prewhere_pushdown
- 03786_outer_to_semi_or_anti

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract the shared identifier-building logic from `createColumnIdentifier`
and `createColumnIdentifierOrGet` into a static `buildColumnIdentifier`
helper, as suggested in review.

Fix the `03375_l2_distance_transposed_partial_reads_pass` reference file
where the EXPLAIN output truncation point for `cosineDistanceTransposed`
was off by 8 characters (due to backtick-quoted column names making the
line longer).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…istance_transposed_partial_reads_pass\`

The EXPLAIN output truncation point for the \`L2DistanceTransposed\`
function result name was off — the backtick-quoted column identifiers
make the line longer, so the fixed-width truncation cuts deeper into
the result name.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eIfNeed` in column identifiers

The `backQuoteIfNeed` change in `buildColumnIdentifier` causes EXPLAIN
output to backtick-quote column identifiers containing dots
(e.g. `__table1.json.a` → `__table1.\`json.a\``). This test was missed
in the earlier reference update.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 31, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.10% +0.00%
Functions 90.90% 90.90% +0.00%
Branches 76.70% 76.70% +0.00%

Changed lines: 100.00% (28/28) · Uncovered code

Full report · Diff report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant