-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Allow Iceberg table engine to point to a specific metadata file #47412
Copy link
Copy link
Closed
Labels
Description
The current implementation of the Iceberg table engine takes a URL with the path to an existing Iceberg table. The example given in the documentation is http://test.s3.amazonaws.com/clickhouse-bucket/. The engine calls S3DataLakeMetadataReadHelper::listFilesMatchSuffix to list all *metadata.json files in the metadata directory (this directory name is hard-coded) under this base path, and then picks the metadata file that sorts last in lexicographical order.
There are a few problems with this implementation:
- The name of the metadata directory is unnecessarily hard-coded. It would be better if the
urlparameter directly specified the path to the metadata directory, not its parent directory. Unfortunately, changing this would break compatibility for current users of the Iceberg table engine. - Listing the metadata files involves one or more S3
ListObjectsV2calls. This is slow when themetadatadirectory contains many files, which is often the case for Iceberg tables in the real world. - The user is not given the option to choose a specific metadata file, which specifies the
current-snapshot-idfor an immutable snapshot of the table. It is not possible toCREATE TABLEfrom a specific version of an Iceberg table that is not the latest version.
Reactions are currently unavailable