Check and mark the interserver IO address active in DDL worker#92339
Conversation
|
Workflow [PR], commit [61b2d55] Summary: ❌
|
208c812 to
f299bf6
Compare
0929de7 to
833f4b7
Compare
…o mark replicas active.
833f4b7 to
357dfbe
Compare
|
What happens when the cluster gets updated through |
I updated the logic, it will notify shared_ddl_worker that host IDs were updated. |
| // Add interserver IO host IDs for Replicated DBs | ||
| try | ||
| { | ||
| auto host_port = context->getInterserverIOAddress(); | ||
| HostID interserver_io_host_id = {host_port.first, port}; | ||
| all_host_ids.emplace(interserver_io_host_id.toString()); | ||
| LOG_INFO(log, "Add interserver IO host ID {}", interserver_io_host_id.toString()); |
There was a problem hiding this comment.
So this is the main part that will fix the issue in the cloud
The problem was that context->getClusters() does not return Replicated DB clusters, so the list of hosts was empty in the cloud. However, we don't need to check all hosts in Replicated DB clusters because it's enough to simply use getInterserverIOAddress which is our host for sure
And for remote_servers config, we notify DDLWorker on config changes, but it's a separate fix
There was a problem hiding this comment.
Yes. Originally, I thought the IP of Replicated DBs was also changeable. But it is not.
I will update the PR description and title.
|
03221_merge_profile_events test_merges_memory_limit/test.py::test_memory_limit_success |
6f99054
Cherry pick #92339 to 25.11: Check and mark the interserver IO address active in DDL worker
Cherry pick #92339 to 25.12: Check and mark the interserver IO address active in DDL worker
Backport #92339 to 25.12: Check and mark the interserver IO address active in DDL worker
…very iteration Replica dirs in ZK are created in enqueueQueryAttempt() when the first DDL is enqueued. At worker init getChildren(replicas_dir) was still empty, so markReplicasActive() never created replicas_dir/<host_id>/active for those host_ids. The worker requires that active node (and for loopback, this node's UUID) before executing a task, so the task was skipped with "loopback not claimed" and the initiator saw timeouts (e.g. HTTP 503). Call markReplicasActive(reinitialized) on every main-loop iteration, before scheduleTasks(), so new replica dirs get their active node before we schedule tasks. Future backport ClickHouse#92339 It checks and marks the interserver IO address as active in DDLWorker::markReplicasActive. Notify DDLWorker when host IDs are updated when cluster config updated. This is a separate fix to let DDLWorker runs markReplicaActive again when the host IDs are updated in remoter_servers config.
…very iteration Replica dirs in ZK are created in enqueueQueryAttempt() when the first DDL is enqueued. At worker init getChildren(replicas_dir) was still empty, so markReplicasActive() never created replicas_dir/<host_id>/active for those host_ids. The worker requires that active node (and for loopback, this node's UUID) before executing a task, so the task was skipped with "loopback not claimed" and the initiator saw timeouts (e.g. HTTP 503). Call markReplicasActive(reinitialized) on every main-loop iteration, before scheduleTasks(), so new replica dirs get their active node before we schedule tasks. Future backport ClickHouse#92339 It checks and marks the interserver IO address as active in DDLWorker::markReplicasActive. Notify DDLWorker when host IDs are updated when cluster config updated. This is a separate fix to let DDLWorker runs markReplicaActive again when the host IDs are updated in remoter_servers config.
…very iteration Replica dirs in ZK are created in enqueueQueryAttempt() when the first DDL is enqueued. At worker init getChildren(replicas_dir) was still empty, so markReplicasActive() never created replicas_dir/<host_id>/active for those host_ids. The worker requires that active node (and for loopback, this node's UUID) before executing a task, so the task was skipped with "loopback not claimed" and the initiator saw timeouts (e.g. HTTP 503). Call markReplicasActive(reinitialized) on every main-loop iteration, before scheduleTasks(), so new replica dirs get their active node before we schedule tasks. Future backport ClickHouse#92339 It checks and marks the interserver IO address as active in DDLWorker::markReplicasActive. Notify DDLWorker when host IDs are updated when cluster config updated. This is a separate fix to let DDLWorker runs markReplicaActive again when the host IDs are updated in remoter_servers config.
…licas-active-on-new-host-ids Check and mark the interserver IO address active in DDL worker
25.8.16 Stable backport of ClickHouse#92339: Check and mark the interserver IO address active in DDL worker
Backport #92339 to 25.11: Check and mark the interserver IO address active in DDL worker
Cherry pick #92339 to 25.8: Check and mark the interserver IO address active in DDL worker
Backport #92339 to 25.8: Check and mark the interserver IO address active in DDL worker
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Related issue #https://github.com/ClickHouse/support-escalation/issues/6365
Previously, when marking replica active, we don't check the interserver IO address. This address is used for cluster created by Replicated DBs.
In this PR:
markReplicaActiveagain when the host IDs are updated inremoter_serversconfig.Documentation entry for user-facing changes