Conversation
|
Workflow [PR], commit [74881e4] Summary: ❌
AI ReviewSummaryThis PR adds support for native common virtuals in Findings💡 Nits
ClickHouse Rules
Final Verdict
|
src/Storages/StorageSnapshot.cpp
Outdated
| { | ||
| auto all_columns = metadata->getColumns().get(options); | ||
| const auto & common_virtual_columns = IStorage::getCommonVirtuals(); | ||
| const auto common_virtual_columns = storage.getCommonVirtuals(); |
There was a problem hiding this comment.
StorageSnapshot is documented as a snapshot that "fixes set columns" for a query, but here common virtuals are fetched from storage.getCommonVirtuals() on every call instead of being captured at snapshot creation.
Since setVirtuals updates both virtuals and common_virtuals, a concurrent metadata/virtuals update can make one StorageSnapshot observe old virtual_columns + new common_virtuals (or vice versa), causing inconsistent column visibility (e.g. transient loss/duplication of _table).
Please store common virtuals inside StorageSnapshot (similar to virtual_columns) and use that stored pointer in all StorageSnapshot methods.
| if (std::dynamic_pointer_cast<StorageDistributed>(storage)) | ||
| return false; | ||
|
|
||
| if (storage->getVirtualsPtr()->has("_table")) |
There was a problem hiding this comment.
extractRequiredNonTableColumnsFromStorage uses storage->getVirtualsPtr()->has("_table") (current storage state) instead of storage_snapshot->virtual_columns (the captured snapshot used for planning/reading).
If virtual columns change concurrently, this branch can decide to keep _table in storage_column_names while the StorageSnapshot used for the read still doesn't expose _table, causing inconsistent planning/header behavior.
Please base this check on storage_snapshot->virtual_columns->has("_table") (same snapshot object used for the rest of the function). The same fix should be applied in MutationsInterpreter::extractRequiredNonTableColumnsFromStorage.
| if (std::dynamic_pointer_cast<StorageDistributed>(storage)) | ||
| return false; | ||
|
|
||
| if (storage->getVirtualsPtr()->has("_table")) |
There was a problem hiding this comment.
Same snapshot-consistency issue as in planner path: this check reads storage->getVirtualsPtr() (live mutable state) while the function otherwise relies on storage_snapshot.
If virtual columns are updated concurrently, _table handling can diverge from what storage_snapshot actually contains, causing inconsistent mutation read plan/header behavior.
Please use storage_snapshot->virtual_columns->has("_table") here as well.
| VirtualColumnsDescription virtuals_desc; | ||
| virtuals_desc.addEphemeral("_path", std::make_shared<DataTypeLowCardinality>(std::make_shared<DataTypeString>()), ""); | ||
| virtuals_desc.addEphemeral("_file", std::make_shared<DataTypeLowCardinality>(std::make_shared<DataTypeString>()), ""); | ||
| virtuals_desc.addEphemeral("_table", std::make_shared<DataTypeLowCardinality>(std::make_shared<DataTypeString>()), ""); |
There was a problem hiding this comment.
This change introduces a new user-visible virtual column _table for StorageHive (via setVirtuals and value materialization in getResultChunk). Given that behavior change, the PR template's Changelog category being Not for changelog looks mismatched. Please switch to an appropriate category and add a concrete user-facing Changelog entry describing _table support for Hive reads.
LLVM Coverage Report
Changed lines: 0.00% (0/111) · Uncovered code |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Part of #100766