Skip to content

Merge common virtuals into table specific#100766

Open
Michicosun wants to merge 25 commits intomasterfrom
merge_common_virtuals_into_table_specific
Open

Merge common virtuals into table specific#100766
Michicosun wants to merge 25 commits intomasterfrom
merge_common_virtuals_into_table_specific

Conversation

@Michicosun
Copy link
Copy Markdown
Member

@Michicosun Michicosun commented Mar 25, 2026

Changelog category (leave one):

  • Backward Incompatible Change

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

This patch combines common_virtual_columns and virtual_columns to simplify interfaces and query analysis.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 25, 2026

Workflow [PR], commit [44d1d33]

Summary:

job_name test_name status info comment
Stateless tests (arm_asan_ubsan, targeted) failure
03203_hive_style_partitioning FAIL cidb
04042_object_storage_listing_profile_events FAIL cidb
04043_iceberg_delete_sort_order_violation FAIL cidb
03741_s3_glob_table_path_pushdown FAIL cidb
02302_s3_file_pruning FAIL cidb
02495_s3_filter_by_file FAIL cidb
Integration tests (amd_asan_ubsan, targeted) failure
test_s3_cluster/test.py::test_hive_partitioning[False-0] FAIL cidb
test_s3_cluster/test.py::test_hive_partitioning[False-1] FAIL cidb
test_s3_cluster/test.py::test_hive_partitioning[True-0] FAIL cidb
test_s3_cluster/test.py::test_hive_partitioning[True-1] FAIL cidb
test_storage_azure_blob_storage/test.py::test_filter_using_file FAIL cidb
test_storage_azure_blob_storage/test.py::test_filtering_by_file_or_path FAIL cidb
test_storage_delta/test.py::test_filtering_by_virtual_columns[1] FAIL cidb
test_storage_delta/test.py::test_filtering_by_virtual_columns[0] FAIL cidb
test_storage_delta/test.py::test_snapshot_consistency FAIL cidb
test_storage_iceberg_interoperability_azure/test_interoperability.py::test_ch_delete_spark_read FAIL cidb
1 more test cases not shown
Stateless tests (amd_asan_ubsan, distributed plan, parallel, 1/2) failure
03741_s3_glob_table_path_pushdown ERROR cidb
Server died FAIL cidb
Logical error: Invalid number of columns in chunk pushed to OutputPort. Expected A, found B (STID: 2270-38be) FAIL cidb
Stateless tests (amd_asan_ubsan, distributed plan, parallel, 2/2) failure
04042_object_storage_listing_profile_events FAIL cidb
04043_iceberg_delete_sort_order_violation FAIL cidb
03203_hive_style_partitioning FAIL cidb
Stateless tests (amd_debug, parallel) failure
Server died FAIL cidb
Logical error: Invalid number of columns in chunk pushed to OutputPort. Expected A, found B (STID: 2270-38be) FAIL cidb
Stateless tests (amd_tsan, parallel, 1/2) failure
Server died FAIL cidb
Logical error: Invalid number of columns in chunk pushed to OutputPort. Expected A, found B (STID: 2270-3184) FAIL cidb
Stateless tests (amd_tsan, parallel, 2/2) failure
04043_iceberg_delete_sort_order_violation FAIL cidb
04042_object_storage_listing_profile_events FAIL cidb
03203_hive_style_partitioning FAIL cidb
Stateless tests (arm_binary, parallel) failure
Server died FAIL cidb
03203_hive_style_partitioning UNKNOWN cidb
04043_iceberg_delete_sort_order_violation UNKNOWN cidb
03741_s3_glob_table_path_pushdown UNKNOWN cidb
Logical error: Invalid number of columns in chunk pushed to OutputPort. Expected A, found B (STID: 2270-38be) FAIL cidb
Stateless tests (amd_llvm_coverage, 1/3) failure
Server died FAIL cidb
Logical error: Invalid number of columns in chunk pushed to OutputPort. Expected A, found B (STID: 2270-38be) FAIL cidb
Stateless tests (amd_llvm_coverage, 2/3) failure
02495_s3_filter_by_file FAIL cidb

AI Review

Summary

This PR migrates common virtual column handling (notably _table) from global storage-level lists to per-storage virtual registration and shared materialization helpers. The direction is good, but there are several correctness/lifetime issues in the new common-virtual plumbing that can produce undefined behavior, inconsistent headers vs read requests, and changed _table semantics in planner paths. Also, PR metadata is mismatched (Backward Incompatible Change does not reflect the actual change scope), and the changelog text is too vague.

PR Metadata

⚠️ Changelog category is not semantically correct. This is not clearly a user-visible backward-incompatible change; it is mostly internal refactoring + virtual-column behavior fixes/regressions.

Suggested replacement:
- Improvement

⚠️ Changelog entry quality is too vague. It should state user-facing impact on _table and query paths.

Suggested replacement:
Unified common and storage-specific virtual column handling; fixed _table materialization across multiple storages and query-plan paths, including virtual-only projections and system/storage read consistency.

Missing context
  • ⚠️ No CI logs or Praktika report links were provided in the review context.
Findings

❌ Blockers

  • [src/Storages/VirtualColumnUtils.cpp:792-795] extendWithCommonVirtualColumns(Pipe &&, ...) captures expression by reference inside addSimpleTransform. The lambda runs after function return, so the capture dangles and may trigger undefined behavior.

    • Suggested fix: capture by value, e.g. [expression](const SharedHeader & header) { return std::make_shared<ExpressionTransform>(header, expression); }.
  • [src/Storages/VirtualColumnUtils.cpp:805-820] filterCommonVirtualColumns can observe metadata that differs from the active read snapshot if not consistently wired with the query snapshot at call sites; fallback selection (result.empty()) can then choose a physical column inconsistent with the read header, leading to Unknown identifier exceptions in races.

    • Suggested fix: require/use the same StorageSnapshot (or immutable metadata from it) end-to-end at every call site and avoid independent metadata lookups.
  • [src/Storages/VirtualColumnUtils.cpp:720] _table materialization now always uses storage->getStorageID().getTableName(), which can diverge from temporary/external/CTE-visible names previously used in planner flow.

    • Impact: behavior regression for temporary table naming semantics.
    • Suggested fix: pass table-expression visible name/temporary alias into common virtual materialization for planner/analyzer table-expression reads.

⚠️ Majors

  • [src/Storages/IStorage.h:228-237] IStorage.h (high fan-out header) now constructs common virtual data types inline in setVirtuals, increasing transitive compile-time costs.
    • Suggested fix: move common virtual registration helper to IStorage.cpp (declaration only in header).

💡 Nits

  • [src/Storages/StorageBuffer.cpp:314] Typo: dest_colummdest_column.
Tests
  • ⚠️ Add/restore focused coverage for temporary/external table naming with _table (old analyzer + analyzer), because current test updates removed the explicit temporary-table checks that used to validate this behavior.
ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal ⚠️ Some assertions around temporary-table _table behavior were removed from existing coverage.
Experimental gate
No magic constants
Backward compatibility ⚠️ _table naming semantics appear changed for temp/external/CTE paths.
SettingsChangesHistory.cpp
PR metadata quality ⚠️ Category mismatch and vague changelog entry.
Safe rollout ⚠️ Lifetime/header consistency issues in common-virtual materialization need fixes before safe rollout.
Compilation time ⚠️ Non-trivial common-virtual type construction moved into IStorage.h.
Performance & Safety
  • Safety concern: the lambda lifetime bug in extendWithCommonVirtualColumns(Pipe &&, ...) is a real UB risk under normal execution.
Final Verdict
  • Status: ⚠️ Request changes
  • Minimum required actions:
    1. Fix dangling capture in extendWithCommonVirtualColumns(Pipe &&, ...).
    2. Ensure snapshot-consistent common-virtual filtering/materialization in all call paths.
    3. Restore correct _table naming for temporary/external/CTE contexts.
    4. Update PR metadata (Changelog category + specific user-facing Changelog entry).

@clickhouse-gh clickhouse-gh bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Mar 25, 2026
@Michicosun Michicosun added the pr-backward-incompatible Pull request with backwards incompatible changes label Mar 26, 2026
@clickhouse-gh clickhouse-gh bot removed the pr-backward-incompatible Pull request with backwards incompatible changes label Mar 26, 2026
@clickhouse-gh clickhouse-gh bot added pr-backward-incompatible Pull request with backwards incompatible changes and removed pr-not-for-changelog This PR should not be mentioned in the changelog labels Mar 26, 2026
if (!needMaterializeCommonVirtualColumns(pipe.getHeader(), storage->getVirtualsPtr(), requested_columns))
return pipe;

auto dag = constructMaterializingCommonVirtualColumnsDAG(pipe.getHeader(), requested_columns, storage);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extendWithCommonVirtualColumns(Pipe &&, ...) creates expression as a local variable, then captures it by reference in pipe.addSimpleTransform([&](...) { ... expression ... }).

That lambda is executed after this function returns, so the reference to expression is dangling. This is a lifetime bug and can lead to undefined behavior.

Please capture expression by value instead (for example [expression](const SharedHeader & header) { ... }).

const auto & our_columns = metadata_snapshot->getColumns();
auto dest_columm = dest_columns.tryGetColumnOrSubcolumn(GetColumnsOptions::AllPhysical, column_name);
return dest_columm && dest_columm->type->equals(*our_columns.getColumnOrSubcolumn(GetColumnsOptions::AllPhysical, column_name).type);
auto dest_columm = destination_columns.tryGetByName(column_name);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: dest_columm should be dest_column for readability and grep-ability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backward-incompatible Pull request with backwards incompatible changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants