ddl requests hang for engine=kafka() table

* There is table with engine=kafka() that watches to kafka topic with ~ 10K events per second. Actually, it happens on different topics with different eps, but they must be "heavy". Also, there is 3 more tables: Materialized view for reading and inserting data, table with data on disk and distributed table, for cluster reason. 
* After some time(i can't get exactly how much, some times hour, sometimes more) when enough data was inserted, i had detached MV table on one random cluster node. And it succedded. 
* Then i try to detach or drop table with kafka engine. Request fails with timeout error:

```
#> drop table kafka.kafka_engine_table
[2020-05-07 16:27:01] Read timed out
```

In logs i sow something like that:
```
2020.05.07 16:22:01.745935 [ 256078 ] {7b2fe9a4-5a7b-4853-a50b-cc146fffc75d} <Debug> executeQuery: (from x.x.x.x:58480, user: admin) select timezone() FORMAT TabSeparatedWithNamesAndTypes;
2020.05.07 16:22:01.746132 [ 256078 ] {7b2fe9a4-5a7b-4853-a50b-cc146fffc75d} <Trace> AccessRightsContext (admin): Access granted: SELECT(dummy) ON system.one
2020.05.07 16:22:01.746176 [ 256078 ] {7b2fe9a4-5a7b-4853-a50b-cc146fffc75d} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
2020.05.07 16:22:01.746319 [ 256078 ] {7b2fe9a4-5a7b-4853-a50b-cc146fffc75d} <Information> executeQuery: Read 1 rows, 1.00 B in 0.000 sec., 2839 rows/sec., 2.77 KiB/sec.
2020.05.07 16:22:01.746395 [ 256078 ] {7b2fe9a4-5a7b-4853-a50b-cc146fffc75d} <Debug> MemoryTracker: Peak memory usage (for query): 4.01 MiB.
2020.05.07 16:22:01.751176 [ 256078 ] {7b2fe9a4-5a7b-4853-a50b-cc146fffc75d} <Debug> executeQuery: (from  x.x.x.x:58480, user: admin) select version() FORMAT TabSeparatedWithNamesAndTypes;
2020.05.07 16:22:01.751272 [ 256078 ] {7b2fe9a4-5a7b-4853-a50b-cc146fffc75d} <Trace> AccessRightsContext (admin): Access granted: SELECT(dummy) ON system.one
2020.05.07 16:22:01.751293 [ 256078 ] {7b2fe9a4-5a7b-4853-a50b-cc146fffc75d} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
2020.05.07 16:22:01.751383 [ 256078 ] {7b2fe9a4-5a7b-4853-a50b-cc146fffc75d} <Information> executeQuery: Read 1 rows, 1.00 B in 0.000 sec., 5511 rows/sec., 5.38 KiB/sec.
2020.05.07 16:22:01.751440 [ 256078 ] {7b2fe9a4-5a7b-4853-a50b-cc146fffc75d} <Debug> MemoryTracker: Peak memory usage (for query): 4.00 MiB.
2020.05.07 16:22:01.779272 [ 134009 ] {7b2fe9a4-5a7b-4853-a50b-cc146fffc75d} <Debug> executeQuery: (from  x.x.x.x:58480, user: admin) SELECT 1 FORMAT TabSeparatedWithNamesAndTypes;
2020.05.07 16:22:01.779425 [ 134009 ] {7b2fe9a4-5a7b-4853-a50b-cc146fffc75d} <Trace> AccessRightsContext (admin): Access granted: SELECT(dummy) ON system.one
2020.05.07 16:22:01.779467 [ 134009 ] {7b2fe9a4-5a7b-4853-a50b-cc146fffc75d} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
2020.05.07 16:22:01.779654 [ 134009 ] {7b2fe9a4-5a7b-4853-a50b-cc146fffc75d} <Information> executeQuery: Read 1 rows, 1.00 B in 0.000 sec., 2922 rows/sec., 2.85 KiB/sec.
2020.05.07 16:22:01.779759 [ 134009 ] {7b2fe9a4-5a7b-4853-a50b-cc146fffc75d} <Debug> MemoryTracker: Peak memory usage (for query): 4.00 MiB.
2020.05.07 16:22:01.788321 [ 256078 ] {7b2fe9a4-5a7b-4853-a50b-cc146fffc75d} <Debug> executeQuery: (from  x.x.x.x:58480, user: admin) drop table kafka.kafka_engine_table
```
And nothing more...
strace for  thread shows  loop like that:

```
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_TIMER, si_timerid=0x3efda, ```
si_overrun=0, si_value={int=0, ptr=0}} ---
mincore(0x7f5b568ec000, 1, [1])         = 0
mincore(0x7f5b568ec000, 1, [1])         = 0
mincore(0x7f5b568ec000, 1, [1])         = 0
mincore(0x7f5b568ed000, 1, [1])         = 0
mincore(0x7f5b568ed000, 1, [1])         = 0
mincore(0x7f5b568ed000, 1, [1])         = 0
mincore(0x7f5b568ed000, 1, [1])         = 0
mincore(0x7f5b568ed000, 1, [1])         = 0
mincore(0x7f5b568ed000, 1, [1])         = 0
mincore(0x7f5b568ed000, 1, [1])         = 0
mincore(0x7f5b568ed000, 1, [1])         = 0
mincore(0x7f5b568ee000, 1, [1])         = 0
mincore(0x7f5b568ef000, 1, [1])         = 0
mincore(0x7f5b568ef000, 1, [1])         = 0
mincore(0x7f5b568f0000, 1, [1])         = 0
mincore(0x7f5b568f1000, 1, [1])         = 0
mincore(0x7f5b568f1000, 1, [1])         = 0
mincore(0x7f5b568f1000, 1, [1])         = 0
mincore(0x7f5b568f1000, 1, [1])         = 0
mincore(0x7f5b568f1000, 1, [1])         = 0
mincore(0x7f5b568f1000, 1, [1])         = 0
mincore(0x7f5b568f1000, 1, [1])         = 0
mincore(0x7f5b568f1000, 1, [1])         = 0
mincore(0x7f5b568f1000, 1, [1])         = 0
write(3254, "\0$7b2fe9a4-5a7b-4853-a50b-cc146f"..., 216) = 216
rt_sigreturn({mask=[PIPE]})             = 202
futex(0x7f63c882c5b0, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
```


But this issue started with more complicated case with cluster DDL. When i detached MV tables on all cluster nodes. Then i tried to drop kafka tables by "on cluster" query and request failed with timeout:

`[2020-05-07 14:43:08] Code: 159, e.displayText() = DB::Exception: Watching task /clickhouse/task_queue/ddl/query-0000008894 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 10 unfinished hosts (0 of them are currently active), they are going to execute the query in background (version 20.3.8.53 (official build))`

And after that no one ddl on cluster had not been worked. In zookeeper for query 0000008894 i sow 10 nodes in status active. And for futher queries nor active or finished nodes was created in zookeeper.


Version:  version 20.3.8.53 but it happened in 19th branch it happened too. 




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ddl requests hang for engine=kafka() table #10740

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ddl requests hang for engine=kafka() table #10740

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions