Iceberg: support external paths in tables#90740
Iceberg: support external paths in tables#90740zvonand wants to merge 56 commits intoClickHouse:masterfrom
Conversation
6e66e6d to
8099db2
Compare
This comment was marked as outdated.
This comment was marked as outdated.
911aa6d to
b2ebefa
Compare
|
Workflow [PR], commit [2863324] Summary: ❌
AI ReviewSummaryThis PR enables Iceberg to read metadata/data/delete files from absolute paths and external storages, including distributed execution support. The main remaining concern is a server-side local file access bypass: Missing context
Findings❌ Blockers
Tests
ClickHouse Rules
Performance & Safety
Final Verdict
|
08b5af5 to
338a036
Compare
|
Code works and test works, but Looks like test itself is flaky: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=b3f2b27c84f775077c1249dc87192bec58b4982a&name_0=MasterCI&name_1=Integration%20tests%20%28amd_tsan%2C%203%2F6%29&name_1=Integration%20tests%20%28amd_tsan%2C%203%2F6%29 PR with fix: #91468 |
| /// Object metadata: size, modification time, etc. | ||
| std::optional<ObjectMetadata> metadata; | ||
| std::optional<String> absolute_path; | ||
| ObjectStoragePtr object_storage_to_use = nullptr; |
There was a problem hiding this comment.
I do not like this idea.
This feature is related to the iceberg but it brings complexity to the interface which makes other realization overcomplicated.
Lest make the code in Iceberg resolve external links.
There was a problem hiding this comment.
I think it is ok in this case
brings complexity to the interface
PathWithMetadatais not part ofIObjectStorageinterface at all -- it's a standalone helper struct. It can even be moved to a standalone header and impl. This structure is perfect for this logic. Adding an optional field doesn't complicate it.- The alternative will be way messier -- juggling
<path, storage>tuples everywhere, doing repeated lookups for every file, introducing some cache that will resolve path into object storage and key, or even introducing a struct that will be almost similar and serve for the same purpose. Neither of these will help make it less complicated. Encapsulating it here is clean and natural. - The new logic in this struct is completely optional -- in other places (where we don't need it) it can be used old way, existing code is unaffected.
There was a problem hiding this comment.
It looks like you know what to do: p.2.
My opinion that 1k rows for the particular feature in the designated place it much better then 1k rows scattered every where which you have to ignore in the most cases.
There was a problem hiding this comment.
Of course, but here we do not have that "ideal" case: the intersection of Iceberg and generic object storage logic and other server logic is large, it is not always encapsulated. There are already many places in code where we have a generic workflow but have specific branching for Iceberg, e.g. like this simple one. Iceberg module does not work with object storages and data files at lower level, that logic remains outside Iceberg.
Current code is actually quite isolated (not 100%, but maybe 80%). Most of the changes outside Iceberg territory is renaming RelativePathWithMetadata -> PathWithMetadata (and also corresponding variables), which does not in any way worsen the code itself. There is also passing additional variables to methods. Now they are only used in Iceberg workflow, but if at some point we need similar multi-storage data lake logic somewhere else -- they will be useful (I do not know whether it will or won't happen, but it may). Sometimes these arguments are passed and checked in an if -- which does not affect other code that does not pass those.
tl;dr: Sure, this code can be improved to be isolated better. This would involve a lot of changes to current structure, but I am not sure this is justified. Iceberg module heavily delegates actual data reading and interaction with object storages outside (which is IMO totally right), to our usual reading-from-object-storage classes, iterators and methods.
There was a problem hiding this comment.
Pull request overview
This PR enables Iceberg tables to support files stored outside the table's primary location, including on different storage types (e.g., metadata on S3, data files on local storage). The implementation abandons the previous constraint that all table files must be within the same storage location.
Key changes:
- Added support for resolving absolute file paths to appropriate storage backends
- Introduced
SecondaryStoragesmechanism to manage multiple storage backends per table - Extended metadata structures to track absolute paths alongside relative paths
Reviewed changes
Copilot reviewed 69 out of 70 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
src/Storages/ObjectStorage/Utils.h/cpp |
Core logic for resolving paths to storage backends, including resolveObjectStorageForPath() and SchemeAuthorityKey |
src/Storages/ObjectStorage/DataLakes/Iceberg/*.h/cpp |
Updated Iceberg components to use absolute paths and secondary storages |
src/Storages/ObjectStorage/StorageObjectStorageSource.cpp |
Modified source to resolve and use secondary storages for reading |
src/Disks/DiskObjectStorage/ObjectStorages/IObjectStorage.h |
Renamed RelativePathWithMetadata to PathWithMetadata, added absolute path support |
src/Core/ProtocolDefines.h |
Added protocol version for Iceberg absolute paths in cluster processing |
tests/integration/test_storage_iceberg_multistorage/ |
New integration tests for multi-storage scenarios |
tests/queries/0_stateless/data_minio/*/metadata/v1.metadata.json |
Updated test fixtures to reflect corrected table locations |
src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergDataObjectInfo.cpp
Outdated
Show resolved
Hide resolved
tests/integration/test_storage_iceberg_multistorage/test_table_in_multiple_locations.py
Outdated
Show resolved
Hide resolved
src/Storages/ObjectStorage/DataLakes/Iceberg/PositionDeleteTransform.cpp
Outdated
Show resolved
Hide resolved
53123da to
c52f465
Compare
| /// Object metadata: size, modification time, etc. | ||
| std::optional<ObjectMetadata> metadata; | ||
| std::optional<String> absolute_path; | ||
| ObjectStoragePtr object_storage_to_use = nullptr; |
| class IObjectStorage; | ||
|
|
||
| /// Thread-safe wrapper for secondary object storages map | ||
| struct SecondaryStorages |
There was a problem hiding this comment.
Secondary sounds strange in this case. How do you understand that object storage is primary or secondary?
There was a problem hiding this comment.
And why don't we need to put "primary" storage to this map with secondary storages?
There was a problem hiding this comment.
Because of the logic itself:
We have our "main" object storage, where we read from by default. But it may happen that we stumble upon a file that belongs to another storage -- thus that storage will be socondary.
tests/queries/0_stateless/data_minio/field_ids_complex_test/metadata/v1.metadata.json
Show resolved
Hide resolved
tests/integration/test_storage_iceberg_multistorage/test_table_in_multiple_locations.py
Outdated
Show resolved
Hide resolved
tests/integration/test_storage_iceberg_multistorage/test_table_in_multiple_locations.py
Outdated
Show resolved
Hide resolved
This comment was marked as outdated.
This comment was marked as outdated.
aa70623 to
fd7b030
Compare
7a84163 to
dc9420d
Compare
e608ff6 to
5a3e0bf
Compare
src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergDataObjectInfo.cpp
Outdated
Show resolved
Hide resolved
src/Storages/ObjectStorage/DataLakes/Iceberg/StatelessMetadataFileGetter.cpp
Outdated
Show resolved
Hide resolved
|
Please fix the conflict and merge master to make the stress test happy (it was fixed) |
…g-allow-external-paths
…g-allow-external-paths
| /// Without this check, metadata could drive reads from arbitrary local paths. | ||
| if (target_scheme_normalized == "file") | ||
| { | ||
| ensure_local_path_inside_user_files(target_decomposed.key); |
There was a problem hiding this comment.
file:// paths are currently constrained to user_files, but this check can be bypassed through symlinks: ensure_local_path_inside_user_files validates the target with lexically_normal, while the created LocalObjectStorage later reads remote_path directly (createReadBufferFromFileBase) without resolving symlinks.
A concrete trace:
user_files=/var/lib/clickhouse/user_files.- Metadata path =
file:///var/lib/clickhouse/user_files/link/secret.parquet, wherelink -> /etc. ensure_local_path_inside_user_filessees lexical path/var/lib/clickhouse/user_files/link/secret.parquetand allows it.LocalObjectStorage::readObjectopens/var/lib/clickhouse/user_files/link/secret.parquet, which dereferences to/etc/secret.parquet.
So untrusted metadata can still escape user_files via symlink traversal.
Can we harden resolveObjectStorageForPath by validating with canonicalized paths (resolving symlinks), or by rejecting any path component that is a symlink before opening?
There was a problem hiding this comment.
this will be secured in #99405. no need to invent bicycles here.
LLVM Coverage Report
Changed lines: 52.53% (644/1226) · Uncovered code |

Closes #84609
Current logic is that all table files (data files, manifests, manifest lists) shall be "inside" the table location, in the same storage. This PR abandons that logic: now files can be located anywhere, even on a different storage type (e.g. all metadata is on s3, data file in in local storage).
In some cases, that old logic shoots back: even incorrect (non-existing) paths, i.e. with redundant items in a prefix were parsed in such a way that files from another locations were read (
/path/to/itemwas read when the specified location was/path/to/to/itemor vice versa)Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Support Iceberg tables that have files outside table location or on different storage.
Documentation entry for user-facing changes
Note
High Risk
Touches Iceberg metadata iteration/compaction/expiry and distributed cluster-function serialization, plus adds dynamic secondary storage creation and optional credential propagation; regressions could lead to wrong files being read or deleted across storages/endpoints.
Overview
Enables Iceberg tables to reference absolute paths and files outside the table location, including files stored on different object storage backends (S3/Azure/HDFS/local). Iceberg manifest/manifest-list handling is updated to treat metadata paths as absolute URIs, resolve them at read/delete time via new object-storage utilities, and carry resolved storage info through iterators/transforms/compaction/expire-snapshots.
Bumps cluster-function processing protocol to add Iceberg absolute-path support and extends
IcebergObjectSerializableInfoto transmit both the metadata path and absolute path (with a coordinator-side guard for old workers when external storage is required). Also adds_path/task distribution handling to prefer absolute paths when available.Introduces new object-storage helpers (
SchemeAuthorityKey,makeAbsolutePath,resolveObjectStorageForPath,SecondaryStorages) and a settings3_propagate_credentials_to_other_storagesto optionally copy base S3 credentials when creating secondary storages. Adds an integration test covering multi-storage Iceberg layouts and updates existing test metadata paths accordingly.Written by Cursor Bugbot for commit 66d1acc. This will update automatically on new commits. Configure here.