-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Cluster engines (such as s3Cluster) should be used automatically with parallel replicas #65024
Description
Introduce a new setting, use_parallel_replicas_for_cluster_engines, which we will enable by default.
If use_parallel_replicas and use_parallel_replicas_for_cluster_engines settings are enabled, and the parallel_replicas_mode is task-based, and the query contains one of s3, url, hdfs, azure (all file-like engines except the file engine), and all other tables are not distributed, they should be automatically transformed to the corresponding -Cluster engines.
The cluster for these engines should be controlled by the cluster_for_parallel_replicas setting, and the maximum number of servers should be controlled by the max_parallel_replicas setting.
Additional context
The setting parallel_distributed_insert_select should be enabled by default.
It should be extended to all data lake engines, such as Iceberg.