Skip to content

S3Queue ordered mode support more generic partitioning#94321

Merged
bharatnc merged 9 commits intomasterfrom
ncb/s3queue-regex-partition
Jan 20, 2026
Merged

S3Queue ordered mode support more generic partitioning#94321
bharatnc merged 9 commits intomasterfrom
ncb/s3queue-regex-partition

Conversation

@bharatnc
Copy link
Copy Markdown
Contributor

@bharatnc bharatnc commented Jan 15, 2026

Following #81040. Try and implement a more generic way to support partition tracking. This PR introduces a new enum that can be the following:

  1. none - no dedicated partition tracking (max seen file is only tracked per bucket)
  2. hive - this is for the new key=value pairs for hive which maintains max seen filename per bucket and per key=value pair.
  3. regex - this is being added in this PR so that the user can specify a flexible partition key from filename.

Specify partition_regex which is a named re2 regex expression and partition_component which specifies which component in the capture group should be used for partitioning. An example as follows:

  • filename: server-1_20251217T100000.000000Z_0001.csv
  • partition_regex: r'(?P<hostname>[^_]+)_(?P<timestamp>\d{8}T\d{6}\.\d{6}Z)_(?P<sequence>\d+)'
  • partition_component: hostname

In the above example, the partitioning per bucket will be done on hostname.

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Support more generic partitioning for S3Queue ordered mode.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@bharatnc bharatnc requested a review from kssenii January 15, 2026 13:45
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Jan 15, 2026

Workflow [PR], commit [87a3902]

Summary:

job_name test_name status info comment
Stateless tests (amd_binary, old analyzer, s3 storage, DatabaseReplicated, parallel) failure
01287_max_execution_speed FAIL cidb, issue ISSUE EXISTS
Stateless tests (amd_ubsan, sequential) failure
03032_dynamically_resize_filesystem_cache_2 FAIL cidb, issue ISSUE EXISTS
Stateless tests (arm_binary, sequential) failure
00993_system_parts_race_condition_drop_zookeeper FAIL cidb, issue ISSUE CREATED
Segmentation fault (STID: 4109-4343) FAIL cidb, issue ISSUE CREATED
BuzzHouse (amd_debug) failure
Logical error: 'Inconsistent AST formatting: the query: (STID: 1941-1bfa) FAIL cidb, issue ISSUE EXISTS

@clickhouse-gh clickhouse-gh bot added the pr-improvement Pull request with some product improvements label Jan 15, 2026
@kssenii kssenii self-assigned this Jan 15, 2026
@bharatnc bharatnc force-pushed the ncb/s3queue-regex-partition branch from f9a4a74 to ef6c838 Compare January 16, 2026 04:31
@bharatnc bharatnc force-pushed the ncb/s3queue-regex-partition branch from 93ca7ae to 0cb3a4c Compare January 16, 2026 13:24
@bharatnc bharatnc force-pushed the ncb/s3queue-regex-partition branch from 0cb3a4c to 252d8ac Compare January 16, 2026 13:25
@bharatnc bharatnc force-pushed the ncb/s3queue-regex-partition branch from 11596ea to ffefba4 Compare January 17, 2026 13:37
@bharatnc bharatnc force-pushed the ncb/s3queue-regex-partition branch from ffefba4 to 2514221 Compare January 17, 2026 13:39
@bharatnc
Copy link
Copy Markdown
Contributor Author

bharatnc commented Jan 20, 2026

Test failures:

@bharatnc bharatnc added this pull request to the merge queue Jan 20, 2026
Merged via the queue into master with commit 793b812 Jan 20, 2026
250 of 260 checks passed
@bharatnc bharatnc deleted the ncb/s3queue-regex-partition branch January 20, 2026 12:41
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jan 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants