Skip to content

If some of the hosts are not resolving, DDLWorker can still process the queue entries for our own host. #39710

@filimonov

Description

@filimonov

After shrinking the cluster distributed DDL tasks stop working.

2022.07.29 12:35:05.233187 [ 86 ] {} <Information> DDLWorker: Cleaned DDLWorker state
...
2022.07.29 12:35:10.233326 [ 86 ] {} <Debug> DDLWorker: Initializing DDLWorker thread
2022.07.29 12:35:10.257369 [ 86 ] {} <Debug> DDLWorker: Initialized DDLWorker thread
2022.07.29 12:35:10.257455 [ 86 ] {} <Debug> DDLWorker: Scheduling tasks
2022.07.29 12:35:10.258816 [ 86 ] {} <Debug> DDLWorker: Will schedule 81 tasks starting from query-0000000071
2022.07.29 12:35:10.261600 [ 86 ] {} <Debug> DDLWorker: Will not execute task query-0000000071: Task has been already processed
...
2022.07.29 12:35:10.356069 [ 86 ] {} <Debug> DDLWorker: Will not execute task query-0000000107: Task has been already processed
2022.07.29 12:35:10.358830 [ 86 ] {} <Debug> DDLWorker: Will not execute task query-0000000108: Task has been already processed
2022.07.29 12:35:10.367560 [ 86 ] {} <Error> DNSResolver: Cannot resolve host (chi-clustername-test2-clustername-test-1-0), error 0: Host not found.
2022.07.29 12:35:10.367847 [ 86 ] {} <Error> DDLWorker: Unexpected error, will try to restart main thread: Code: 198. DB::Exception: Not found address of host: chi-clustername-test2-clustername-test-1-0. (DNS_ERROR), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xb6ff4ba in /usr/bin/clickhouse
1. ? @ 0xb7ed1c8 in /usr/bin/clickhouse
2. ? @ 0xb7ed9e2 in /usr/bin/clickhouse
3. DB::DNSResolver::resolveAddress(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned short) @ 0xb7ee6fd in /usr/bin/clickhouse
4. DB::HostID::isLocalAddress(unsigned short) const @ 0x15d45d2b in /usr/bin/clickhouse
5. DB::DDLTask::findCurrentHostID(std::__1::shared_ptr<DB::Context const>, Poco::Logger*) @ 0x15d488aa in /usr/bin/clickhouse
6. DB::DDLWorker::initAndCheckTask(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, std::__1::shared_ptr<zkutil::ZooKeeper> const&) @ 0x15d52546 in /usr/bin/clickhouse
7. DB::DDLWorker::scheduleTasks(bool) @ 0x15d55d0a in /usr/bin/clickhouse
8. DB::DDLWorker::runMainThread() @ 0x15d4fb05 in /usr/bin/clickhouse
9. ThreadFromGlobalPool::ThreadFromGlobalPool<void (DB::DDLWorker::*)(), DB::DDLWorker*>(void (DB::DDLWorker::*&&)(), DB::DDLWorker*&&)::'lambda'()::operator()() @ 0x15d63bd7 in /usr/bin/clickhouse
10. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xb7aad67 in /usr/bin/clickhouse
11. ? @ 0xb7ae79d in /usr/bin/clickhouse
12. ? @ 0x7f0ef848e609 in ?
13. clone @ 0x7f0ef83b3133 in ?
 (version 22.4.6.53 (official build))
2022.07.29 12:35:10.367899 [ 86 ] {} <Information> DDLWorker: Cleaned DDLWorker state



=============

SELECT *
FROM system.zookeeper
WHERE (path = '/clickhouse/clustername-test2/task_queue/ddl') AND ((name LIKE '%108') OR (name LIKE '%109'))
ORDER BY name ASC

Query id: a84cec71-92bc-4c99-b03a-c06f920cd419

Row 1:
──────
name:           query-0000000108
value:          version: 2
query: ALTER TABLE clustername...
hosts: ['chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D0:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D1:9000']
initiator: chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D0%2D0%2Echi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D0%2Ealtinity%2Dcloud%2Dmanaged%2Dclickhouse%2Esvc%2Ecluster%2Elocal:9000
settings: connect_timeout_with_failover_ms = 1000, load_balancing = 'random', distributed_aggregation_memory_efficient = true, log_queries = true, max_bytes_before_external_group_by = 3221225472, max_bytes_before_external_sort = 3221225472, max_memory_usage = 5798205849, max_memory_usage_for_user = 5798205849, parallel_view_processing = true, default_database_engine = 'Ordinary', allow_nondeterministic_mutations = true, async_insert_max_data_size = 26214400

czxid:          51539609497
mzxid:          51539609497
ctime:          2022-07-27 18:30:49
mtime:          2022-07-27 18:30:49
version:        0
cversion:       3
aversion:       0
ephemeralOwner: 0
dataLength:     1083
numChildren:    3
pzxid:          51539609500
path:           /clickhouse/clustername-test2/task_queue/ddl

Row 2:
──────
name:           query-0000000109
value:          version: 2
query: CREATE TABLE IF NOT EXISTS...
hosts: ['chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D0:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D1:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D1%2D0:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D1%2D1:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D2%2D0:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D2%2D1:9000']
initiator: chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D2%2D1%2D0%2Echi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D2%2D1%2Ealtinity%2Dcloud%2Dmanaged%2Dclickhouse%2Esvc%2Ecluster%2Elocal:9000
settings: connect_timeout_with_failover_ms = 1000, load_balancing = 'random', distributed_aggregation_memory_efficient = true, log_queries = true, max_bytes_before_external_group_by = 3221225472, max_bytes_before_external_sort = 3221225472, max_memory_usage = 5798205849, max_memory_usage_for_user = 5798205849, parallel_view_processing = true, default_database_engine = 'Ordinary', allow_nondeterministic_mutations = true, async_insert_max_data_size = 26214400

czxid:          51539609669
mzxid:          51539609669
ctime:          2022-07-28 00:06:30
mtime:          2022-07-28 00:06:30
version:        0
cversion:       2
aversion:       0
ephemeralOwner: 0
dataLength:     1477
numChildren:    2
pzxid:          51539609670
path:           /clickhouse/clustername-test2/task_queue/ddl

2 rows in set. Elapsed: 0.015 sec. 


=========



WITH CAST(arrayMap(x -> ((extractAllGroups(x, '^(\\w+):\\s*(.*)$')[1])[1], (extractAllGroups(x, '^(\\w+):\\s*(.*)$')[1])[2]), splitByChar('\n', value)), 'Map(String,String)') AS task
SELECT
    name,
    task['hosts'] AS hosts
FROM system.zookeeper
WHERE path = '/clickhouse/clustername-test2/task_queue/ddl'
ORDER BY name ASC

Query id: 0332788d-bf98-4204-bac3-5dcae838b4bc

┌─name─────────────┬─hosts──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ query-0000000071 │ ['chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D0:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D1:9000']                                                                                                                                      │
│ query-0000000072 │ ['chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D0:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D1:9000']                                                                                                                                      │
...
│ query-0000000107 │ ['chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D0:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D1:9000']                                                                                                                                      │
│ query-0000000108 │ ['chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D0:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D1:9000']                                                                                                                                      │
│ query-0000000109 │ ['chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D0:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D1:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D1%2D0:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D1%2D1:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D2%2D0:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D2%2D1:9000'] │
│ query-0000000110 │ ['chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D0:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D1:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D1%2D0:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D1%2D1:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D2%2D0:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D2%2D1:9000'] │
...
│ query-0000000128 │ ['chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D0:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D1:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D1%2D0:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D1%2D1:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D2%2D0:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D2%2D1:9000'] │
│ query-0000000129 │ ['chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D0:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D1:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D1%2D0:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D1%2D1:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D2%2D0:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D2%2D1:9000'] │
│ query-0000000130 │ ['chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D0:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D1:9000']                                                                                                                                      │
│ query-0000000131 │ ['chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D0:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D1:9000']                                                                                                                                      │
...
│ query-0000000150 │ ['chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D0:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D1:9000']                                                                                                                                      │
│ query-0000000151 │ ['chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D0:9000','chi%2Dclustername%2Dtest2%2Dclustername%2Dtest%2D0%2D1:9000']                                                                                                                                      │
└──────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Metadata

Metadata

Assignees

Labels

comp-ddlDDL command coordination and execution (ON CLUSTER, DDL queue).enhancement

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions