Skip to content

Allow Iceberg table engine to point to a specific metadata file #47412

@chairmank

Description

@chairmank

The current implementation of the Iceberg table engine takes a URL with the path to an existing Iceberg table. The example given in the documentation is http://test.s3.amazonaws.com/clickhouse-bucket/. The engine calls S3DataLakeMetadataReadHelper::listFilesMatchSuffix to list all *metadata.json files in the metadata directory (this directory name is hard-coded) under this base path, and then picks the metadata file that sorts last in lexicographical order.

There are a few problems with this implementation:

  • The name of the metadata directory is unnecessarily hard-coded. It would be better if the url parameter directly specified the path to the metadata directory, not its parent directory. Unfortunately, changing this would break compatibility for current users of the Iceberg table engine.
  • Listing the metadata files involves one or more S3 ListObjectsV2 calls. This is slow when the metadata directory contains many files, which is often the case for Iceberg tables in the real world.
  • The user is not given the option to choose a specific metadata file, which specifies the current-snapshot-id for an immutable snapshot of the table. It is not possible to CREATE TABLE from a specific version of an Iceberg table that is not the latest version.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions