Skip to content

Iceberg: support external paths in tables#90740

Open
zvonand wants to merge 56 commits intoClickHouse:masterfrom
zvonand:iceberg-allow-external-paths
Open

Iceberg: support external paths in tables#90740
zvonand wants to merge 56 commits intoClickHouse:masterfrom
zvonand:iceberg-allow-external-paths

Conversation

@zvonand
Copy link
Copy Markdown
Contributor

@zvonand zvonand commented Nov 24, 2025

Closes #84609

Current logic is that all table files (data files, manifests, manifest lists) shall be "inside" the table location, in the same storage. This PR abandons that logic: now files can be located anywhere, even on a different storage type (e.g. all metadata is on s3, data file in in local storage).

In some cases, that old logic shoots back: even incorrect (non-existing) paths, i.e. with redundant items in a prefix were parsed in such a way that files from another locations were read (/path/to/item was read when the specified location was /path/to/to/item or vice versa)

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Support Iceberg tables that have files outside table location or on different storage.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Note

High Risk
Touches Iceberg metadata iteration/compaction/expiry and distributed cluster-function serialization, plus adds dynamic secondary storage creation and optional credential propagation; regressions could lead to wrong files being read or deleted across storages/endpoints.

Overview
Enables Iceberg tables to reference absolute paths and files outside the table location, including files stored on different object storage backends (S3/Azure/HDFS/local). Iceberg manifest/manifest-list handling is updated to treat metadata paths as absolute URIs, resolve them at read/delete time via new object-storage utilities, and carry resolved storage info through iterators/transforms/compaction/expire-snapshots.

Bumps cluster-function processing protocol to add Iceberg absolute-path support and extends IcebergObjectSerializableInfo to transmit both the metadata path and absolute path (with a coordinator-side guard for old workers when external storage is required). Also adds _path/task distribution handling to prefer absolute paths when available.

Introduces new object-storage helpers (SchemeAuthorityKey, makeAbsolutePath, resolveObjectStorageForPath, SecondaryStorages) and a setting s3_propagate_credentials_to_other_storages to optionally copy base S3 credentials when creating secondary storages. Adds an integration test covering multi-storage Iceberg layouts and updates existing test metadata paths accordingly.

Written by Cursor Bugbot for commit 66d1acc. This will update automatically on new commits. Configure here.

@zvonand zvonand force-pushed the iceberg-allow-external-paths branch from 6e66e6d to 8099db2 Compare November 24, 2025 16:17
@zvonand zvonand changed the title Iceberg: support external paths Iceberg: support external paths in tables Nov 24, 2025
@zvonand

This comment was marked as outdated.

@zvonand zvonand force-pushed the iceberg-allow-external-paths branch 3 times, most recently from 911aa6d to b2ebefa Compare November 25, 2025 18:40
@kssenii kssenii added the can be tested Allows running workflows for external contributors label Nov 26, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Nov 26, 2025

Workflow [PR], commit [2863324]

Summary:

job_name test_name status info comment
Stateless tests (arm_asan_ubsan, targeted) failure
03758_fix_isssue_87414 FAIL cidb
00534_functions_bad_arguments3 FAIL cidb
Integration tests (amd_asan_ubsan, targeted) failure
test_backup_restore_on_cluster/test_concurrency.py::test_create_or_drop_tables_during_backup[Replicated-ReplicatedMergeTree] FAIL cidb
test_backup_restore_on_cluster/test_concurrency.py::test_create_or_drop_tables_during_backup[Memory-MergeTree] FAIL cidb
test_backup_restore_on_cluster/test_concurrency.py::test_kill_mutation_during_backup FAIL cidb
Integration tests (amd_binary, 2/5) failure
test_storage_kafka/test_batch_fast.py::test_kafka_commit_on_block_write[generate_old_create_table_query] FAIL cidb
Integration tests (amd_llvm_coverage, 2/5) failure
test_overcommit_tracker/test.py::test_user_overcommit FAIL cidb
Stress test (amd_debug) failure
Hung check failed, possible deadlock found FAIL cidb
Stress test (amd_tsan) failure
Logical error: '(isConst() || isSparse() || isReplicated()) ? getDataType() == rhs.getDataType() : typeid(*this) == typeid(rhs)' (STID: 2508-324f) FAIL cidb
Stress test (arm_msan) failure
MemorySanitizer: use-of-uninitialized-value (STID: 2410-47e0) FAIL cidb

AI Review

Summary

This PR enables Iceberg to read metadata/data/delete files from absolute paths and external storages, including distributed execution support. The main remaining concern is a server-side local file access bypass: file:// path validation can be circumvented through symlink traversal under user_files, which keeps arbitrary local reads reachable from untrusted Iceberg metadata.

Missing context
  • ⚠️ No CI runtime logs/failed-job artifacts were provided in this review context.
Findings

❌ Blockers

  • [src/Storages/ObjectStorage/Utils.cpp:535] file:// validation relies on lexical normalization and prefix checks before creating LocalObjectStorage, but reads later happen via raw filesystem open. A symlink under user_files (for example, user_files/link -> /etc) allows file:///.../user_files/link/... to pass validation and read outside user_files.
    Suggested fix: enforce canonical-path validation (resolve symlinks) for final target, or reject symlink components on the path before opening.
Tests
  • ⚠️ Add an integration test for Iceberg file:// paths with a symlink inside user_files pointing outside, and assert PATH_ACCESS_DENIED.
ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout ⚠️ Security hardening needed before safe rollout for local paths.
Compilation time
Performance & Safety
  • Local-file safety regression risk: untrusted metadata can redirect reads outside allowed roots through symlink traversal on file:// paths.
Final Verdict
  • Status: ⚠️ Request changes
  • Minimum required actions:
    • Harden file:// path enforcement against symlink traversal in resolveObjectStorageForPath/local-read path.
    • Add an integration test covering the symlink-escape scenario.

@divanik divanik self-assigned this Nov 26, 2025
@clickhouse-gh clickhouse-gh bot added the pr-improvement Pull request with some product improvements label Nov 26, 2025
@zvonand zvonand force-pushed the iceberg-allow-external-paths branch 2 times, most recently from 08b5af5 to 338a036 Compare November 28, 2025 19:54
@divanik divanik marked this pull request as draft December 2, 2025 14:15
@zvonand
Copy link
Copy Markdown
Contributor Author

zvonand commented Dec 4, 2025

Code works and test works, but test_storage_iceberg_with_spark/test_cluster_table_function.py::test_cluster_table_function_split_by_row_groups[1-100-s3-1] failed once

Looks like test itself is flaky: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=b3f2b27c84f775077c1249dc87192bec58b4982a&name_0=MasterCI&name_1=Integration%20tests%20%28amd_tsan%2C%203%2F6%29&name_1=Integration%20tests%20%28amd_tsan%2C%203%2F6%29

PR with fix: #91468

@zvonand
Copy link
Copy Markdown
Contributor Author

zvonand commented Dec 4, 2025

Other test fails:

  • 02126_dist_desc -- flaky
  • 00816_long_concurrent_alter_column -- flaky
  • test_storage_rabbitmq/test.py::test_hiding_credentials -- flaky for a long time
  • test_storage_s3_queue/test_5.py::test_failed_startup -- flaky

@zvonand zvonand marked this pull request as ready for review December 4, 2025 11:11
/// Object metadata: size, modification time, etc.
std::optional<ObjectMetadata> metadata;
std::optional<String> absolute_path;
ObjectStoragePtr object_storage_to_use = nullptr;
Copy link
Copy Markdown
Member

@CheSema CheSema Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not like this idea.

This feature is related to the iceberg but it brings complexity to the interface which makes other realization overcomplicated.

Lest make the code in Iceberg resolve external links.

Copy link
Copy Markdown
Contributor Author

@zvonand zvonand Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is ok in this case

brings complexity to the interface

  1. PathWithMetadata is not part of IObjectStorage interface at all -- it's a standalone helper struct. It can even be moved to a standalone header and impl. This structure is perfect for this logic. Adding an optional field doesn't complicate it.
  2. The alternative will be way messier -- juggling <path, storage> tuples everywhere, doing repeated lookups for every file, introducing some cache that will resolve path into object storage and key, or even introducing a struct that will be almost similar and serve for the same purpose. Neither of these will help make it less complicated. Encapsulating it here is clean and natural.
  3. The new logic in this struct is completely optional -- in other places (where we don't need it) it can be used old way, existing code is unaffected.

Copy link
Copy Markdown
Member

@CheSema CheSema Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you know what to do: p.2.

My opinion that 1k rows for the particular feature in the designated place it much better then 1k rows scattered every where which you have to ignore in the most cases.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, but here we do not have that "ideal" case: the intersection of Iceberg and generic object storage logic and other server logic is large, it is not always encapsulated. There are already many places in code where we have a generic workflow but have specific branching for Iceberg, e.g. like this simple one. Iceberg module does not work with object storages and data files at lower level, that logic remains outside Iceberg.

Current code is actually quite isolated (not 100%, but maybe 80%). Most of the changes outside Iceberg territory is renaming RelativePathWithMetadata -> PathWithMetadata (and also corresponding variables), which does not in any way worsen the code itself. There is also passing additional variables to methods. Now they are only used in Iceberg workflow, but if at some point we need similar multi-storage data lake logic somewhere else -- they will be useful (I do not know whether it will or won't happen, but it may). Sometimes these arguments are passed and checked in an if -- which does not affect other code that does not pass those.

tl;dr: Sure, this code can be improved to be isolated better. This would involve a lot of changes to current structure, but I am not sure this is justified. Iceberg module heavily delegates actual data reading and interaction with object storages outside (which is IMO totally right), to our usual reading-from-object-storage classes, iterators and methods.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm agree with @CheSema, let's cleanup this

@CheSema CheSema requested a review from Copilot December 18, 2025 10:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables Iceberg tables to support files stored outside the table's primary location, including on different storage types (e.g., metadata on S3, data files on local storage). The implementation abandons the previous constraint that all table files must be within the same storage location.

Key changes:

  • Added support for resolving absolute file paths to appropriate storage backends
  • Introduced SecondaryStorages mechanism to manage multiple storage backends per table
  • Extended metadata structures to track absolute paths alongside relative paths

Reviewed changes

Copilot reviewed 69 out of 70 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/Storages/ObjectStorage/Utils.h/cpp Core logic for resolving paths to storage backends, including resolveObjectStorageForPath() and SchemeAuthorityKey
src/Storages/ObjectStorage/DataLakes/Iceberg/*.h/cpp Updated Iceberg components to use absolute paths and secondary storages
src/Storages/ObjectStorage/StorageObjectStorageSource.cpp Modified source to resolve and use secondary storages for reading
src/Disks/DiskObjectStorage/ObjectStorages/IObjectStorage.h Renamed RelativePathWithMetadata to PathWithMetadata, added absolute path support
src/Core/ProtocolDefines.h Added protocol version for Iceberg absolute paths in cluster processing
tests/integration/test_storage_iceberg_multistorage/ New integration tests for multi-storage scenarios
tests/queries/0_stateless/data_minio/*/metadata/v1.metadata.json Updated test fixtures to reflect corrected table locations

@scanhex12 scanhex12 self-assigned this Dec 20, 2025
@zvonand zvonand force-pushed the iceberg-allow-external-paths branch from 53123da to c52f465 Compare December 22, 2025 16:36
/// Object metadata: size, modification time, etc.
std::optional<ObjectMetadata> metadata;
std::optional<String> absolute_path;
ObjectStoragePtr object_storage_to_use = nullptr;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm agree with @CheSema, let's cleanup this

class IObjectStorage;

/// Thread-safe wrapper for secondary object storages map
struct SecondaryStorages
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Secondary sounds strange in this case. How do you understand that object storage is primary or secondary?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why don't we need to put "primary" storage to this map with secondary storages?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the logic itself:

We have our "main" object storage, where we read from by default. But it may happen that we stumble upon a file that belongs to another storage -- thus that storage will be socondary.

@zvonand

This comment was marked as outdated.

@divanik divanik removed their assignment Dec 29, 2025
@zvonand zvonand force-pushed the iceberg-allow-external-paths branch 3 times, most recently from aa70623 to fd7b030 Compare January 5, 2026 21:55
@zvonand zvonand force-pushed the iceberg-allow-external-paths branch 3 times, most recently from 7a84163 to dc9420d Compare January 6, 2026 22:37
@zvonand zvonand force-pushed the iceberg-allow-external-paths branch from e608ff6 to 5a3e0bf Compare March 11, 2026 23:14
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

@rienath
Copy link
Copy Markdown
Member

rienath commented Mar 13, 2026

Please fix the conflict and merge master to make the stress test happy (it was fixed)

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

/// Without this check, metadata could drive reads from arbitrary local paths.
if (target_scheme_normalized == "file")
{
ensure_local_path_inside_user_files(target_decomposed.key);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file:// paths are currently constrained to user_files, but this check can be bypassed through symlinks: ensure_local_path_inside_user_files validates the target with lexically_normal, while the created LocalObjectStorage later reads remote_path directly (createReadBufferFromFileBase) without resolving symlinks.

A concrete trace:

  1. user_files = /var/lib/clickhouse/user_files.
  2. Metadata path = file:///var/lib/clickhouse/user_files/link/secret.parquet, where link -> /etc.
  3. ensure_local_path_inside_user_files sees lexical path /var/lib/clickhouse/user_files/link/secret.parquet and allows it.
  4. LocalObjectStorage::readObject opens /var/lib/clickhouse/user_files/link/secret.parquet, which dereferences to /etc/secret.parquet.

So untrusted metadata can still escape user_files via symlink traversal.

Can we harden resolveObjectStorageForPath by validating with canonicalized paths (resolving symlinks), or by rejecting any path component that is a symlink before opening?

Copy link
Copy Markdown
Contributor Author

@zvonand zvonand Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be secured in #99405. no need to invent bicycles here.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 30, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.20% 84.10% -0.10%
Functions 90.90% 90.90% +0.00%
Branches 76.70% 76.60% -0.10%

Changed lines: 52.53% (644/1226) · Uncovered code

Full report · Diff report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors manual approve Manual approve required to run CI pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reading tables from the Iceberg Catalog does not work when files are split across multiple storages backend

8 participants