Allow Iceberg table engine to point to a specific metadata file

The current implementation of the Iceberg table engine takes a URL with the path to an existing Iceberg table. The example given in the documentation is `http://test.s3.amazonaws.com/clickhouse-bucket/`. The engine calls `S3DataLakeMetadataReadHelper::listFilesMatchSuffix` to list all `*metadata.json` files in the `metadata` directory (this directory name is hard-coded) under this base path, and then picks the metadata file that sorts last in lexicographical order.

There are a few problems with this implementation:

* The name of the metadata directory is unnecessarily hard-coded. It would be better if the `url` parameter directly specified the path to the metadata directory, not its parent directory. Unfortunately, changing this would break compatibility for current users of the Iceberg table engine.
* Listing the metadata files involves one or more S3 `ListObjectsV2` calls. This is slow when the `metadata` directory contains many files, which is often the case for Iceberg tables in the real world.
* The user is not given the option to choose a specific metadata file, which specifies the `current-snapshot-id` for an immutable snapshot of the table. It is not possible to `CREATE TABLE` from a specific version of an Iceberg table that is not the latest version. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow Iceberg table engine to point to a specific metadata file #47412

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Allow Iceberg table engine to point to a specific metadata file #47412

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions