Skip to content

Cluster engines (such as s3Cluster) should be used automatically with parallel replicas #65024

@alexey-milovidov

Description

@alexey-milovidov

Introduce a new setting, use_parallel_replicas_for_cluster_engines, which we will enable by default.

If use_parallel_replicas and use_parallel_replicas_for_cluster_engines settings are enabled, and the parallel_replicas_mode is task-based, and the query contains one of s3, url, hdfs, azure (all file-like engines except the file engine), and all other tables are not distributed, they should be automatically transformed to the corresponding -Cluster engines.

The cluster for these engines should be controlled by the cluster_for_parallel_replicas setting, and the maximum number of servers should be controlled by the max_parallel_replicas setting.

Additional context

The setting parallel_distributed_insert_select should be enabled by default.

It should be extended to all data lake engines, such as Iceberg.

Metadata

Metadata

Assignees

Labels

featurewarmup taskThe task for new ClickHouse team members. Low risk, moderate complexity, no urgency.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions