Skip to content

Parallel table shutdown #72557

@azat

Description

@azat

While looking into some CI failures, like this one, I found out that the problem was slow table shutdown that are used to proxy system tables to cloud.

And even flush_on_detach=0 does not help for Distributed tables since for some tables the sending can be already in progress.

I think that parallel table shutdown should help, this will also improve performance of table shutdown in general.

CI failure details
2024.11.25 22:25:09.343396 [ 35138 ] {} <Trace> BaseDaemon: Received signal 15
2024.11.25 22:25:09.343477 [ 35138 ] {} <Information> Application: Received termination signal (Terminated)
2024.11.25 22:25:09.343544 [ 35137 ] {} <Debug> Application: Received termination signal.
...
2024.11.25 22:26:56.877500 [ 35137 ] {} <Debug> StorageDistributed (zookeeper_log_sender): Joining background threads for async INSERT
2024.11.25 22:26:56.877530 [ 35137 ] {} <Debug> StorageDistributed (zookeeper_log_sender): Background threads for async INSERT joined
2024.11.25 22:26:57.284909 [ 35133 ] {} <Fatal> Application: Child process was terminated by signal 9 (KILL). If it is not done by 'forcestop' command or manually, the possible cause is OOM Killer (see 'dmesg' and look at the '/var/log/kern.log' for the details).
2024-11-25 22:26:56 Thread 1 (Thread 0x7f0d9c23f500 (LWP 35137) "clickhouse-serv"):
2024-11-25 22:26:56 #0  0x00007f0d9c3c02c0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
2024-11-25 22:26:56 #1  0x00007f0d9c3c7002 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libc.so.6
2024-11-25 22:26:56 #2  0x00005618e962896b in pthread_mutex_lock (arg=0x511000ba64e8) at ./build_docker/./src/Common/ThreadFuzzer.cpp:427
2024-11-25 22:26:56 #3  0x000056190a964eeb in std::__1::__libcpp_mutex_lock[abi:v15007](pthread_mutex_t*) (__m=0x511000ba64e8) at ./contrib/llvm-project/libcxx/include/__threading_support:304
2024-11-25 22:26:56 #4  std::__1::mutex::lock (this=0x511000ba64e8) at ./build_docker/./contrib/llvm-project/libcxx/src/mutex.cpp:38
2024-11-25 22:26:56 #5  0x00005618f657245e in std::__1::lock_guard<std::__1::mutex>::lock_guard[abi:v15007](std::__1::mutex&) (__m=..., this=<optimized out>) at ./contrib/llvm-project/libcxx/include/__mutex_base:94
2024-11-25 22:26:56 #6  DB::BackgroundSchedulePoolTaskInfo::deactivate (this=0x511000ba6498) at ./build_docker/./src/Core/BackgroundSchedulePool.cpp:51
2024-11-25 22:26:56 #7  0x00005618fc95eeb3 in DB::DistributedAsyncInsertDirectoryQueue::shutdownWithoutFlush (this=<optimized out>) at ./build_docker/./src/Storages/Distributed/DistributedAsyncInsertDirectoryQueue.cpp:193
2024-11-25 22:26:56 #8  0x00005618fbeba433 in DB::StorageDistributed::flushClusterNodesAllDataImpl (this=<optimized out>, local_context=..., settings_changes=..., flush=<optimized out>) at ./build_docker/./src/Storages/StorageDistributed.cpp:1764
2024-11-25 22:26:56 #9  0x00005618fbeb9120 in DB::StorageDistributed::flushAndPrepareForShutdown (this=0x519000270ba0) at ./build_docker/./src/Storages/StorageDistributed.cpp:1711
2024-11-25 22:26:56 #10 0x00005618f7147799 in DB::DatabaseWithOwnTablesBase::shutdown (this=<optimized out>) at ./build_docker/./src/Databases/DatabasesCommon.cpp:452
2024-11-25 22:26:56 #11 0x00005618f7ea4308 in DB::DatabaseCatalog::shutdownImpl (this=0x51d00005a080) at ./build_docker/./src/Interpreters/DatabaseCatalog.cpp:281
2024-11-25 22:26:56 #12 0x00005618f7ced0f8 in DB::ContextSharedPart::shutdown (this=0x5210002a8100) at ./build_docker/./src/Interpreters/Context.cpp:744
2024-11-25 22:26:56 #13 0x00005618e9960f55 in DB::Server::main(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&)::$_2::operator()() const (this=0x7ffdd888d770) at ./build_docker/./programs/server/Server.cpp:1085

2024-11-25 22:26:51 Thread 291 (Thread 0x7f0a06f0a640 (LWP 35698) "BgDistSchPool"):
2024-11-25 22:26:51 #0  0x00007f0d9c44381c in read () from /lib/x86_64-linux-gnu/libc.so.6
2024-11-25 22:26:51 #1  0x00005618d8cd0066 in __interceptor_read ()
2024-11-25 22:26:51 #2  0x0000561909deba9e in sock_read (b=0x50d0005adae0, out=0x52900028f203 "\027\003\003\005\323\001", outl=5) at ./build_docker/./contrib/openssl/crypto/bio/bss_sock.c:127
2024-11-25 22:26:51 #3  0x0000561909dd7109 in bread_conv (bio=0x4b, data=0x52900028f203 "\027\003\003\005\323\001", datal=5, readbytes=0x7f0a04935960) at ./build_docker/./contrib/openssl/crypto/bio/bio_meth.c:121
2024-11-25 22:26:51 #4  0x0000561909dd352f in bio_read_intern (b=0x4b, b@entry=0x50d0005adae0, data=0x52900028f203, dlen=0, readbytes=0x7f0d9c44381c <read+76>, readbytes@entry=0x7f0a04935960) at ./build_docker/./contrib/openssl/crypto/bio/bio_lib.c:285
2024-11-25 22:26:51 #5  0x0000561909dd3292 in BIO_read (b=<optimized out>, data=<optimized out>, dlen=<optimized out>) at ./build_docker/./contrib/openssl/crypto/bio/bio_lib.c:311
2024-11-25 22:26:51 #6  0x0000561909d4dff3 in tls_default_read_n (rl=0x52100005f100, n=5, max=5, extend=<optimized out>, clearold=<optimized out>, readbytes=0x7f0a04aff220) at ./build_docker/./contrib/openssl/ssl/record/methods/tls_common.c:406
2024-11-25 22:26:51 #7  0x0000561909d4e7cc in tls_get_more_records (rl=0x52100005f100) at ./build_docker/./contrib/openssl/ssl/record/methods/tls_common.c:583
2024-11-25 22:26:51 #8  0x0000561909d50dac in tls_read_record (rl=0x52100005f100, rechandle=0x522001a64d80, rversion=0x522001a64d88, type=0x522001a64d8c "\027", data=0x522001a64d90, datalen=0x522001a64da0, epoch=0x0, seq_num=0x0) at ./build_docker/./contrib/openssl/ssl/record/methods/tls_common.c:1130
2024-11-25 22:26:51 #9  0x0000561909d4443e in ssl3_read_bytes (ssl=0x522001a64100, type=23 '\027', recvd_type=0x0, buf=0x7f0908913800 "\001", len=1048576, peek=0, readbytes=0x7f0a04935920) at ./build_docker/./contrib/openssl/ssl/record/rec_layer_s3.c:689
2024-11-25 22:26:51 #10 0x0000561909c949b8 in ssl3_read_internal (s=0x522001a64100, buf=0x7f0908913800, len=1048576, peek=0, readbytes=<optimized out>) at ./build_docker/./contrib/openssl/ssl/s3_lib.c:4528
2024-11-25 22:26:51 #11 0x0000561909cb027c in ssl_read_internal (s=0x522001a64100, buf=<optimized out>, num=<optimized out>, readbytes=<optimized out>) at ./build_docker/./contrib/openssl/ssl/ssl_lib.c:2314
2024-11-25 22:26:51 #12 0x0000561909cb0b52 in SSL_read (s=<optimized out>, buf=<optimized out>, num=<optimized out>) at ./build_docker/./contrib/openssl/ssl/ssl_lib.c:2328
2024-11-25 22:26:51 #13 0x0000561904e7355b in Poco::Net::SecureSocketImpl::receiveBytes (this=0x50f000947c58, buffer=<optimized out>, length=<optimized out>, flags=<optimized out>) at ./build_docker/./base/poco/NetSSL_OpenSSL/src/SecureSocketImpl.cpp:356
2024-11-25 22:26:51 #14 0x00005618e98860f4 in DB::ReadBufferFromPocoSocketBase::socketReceiveBytesImpl (this=<optimized out>, ptr=<optimized out>, size=<optimized out>) at ./build_docker/./src/IO/ReadBufferFromPocoSocket.cpp:78
2024-11-25 22:26:51 #15 0x00005618e9887884 in DB::ReadBufferFromPocoSocketBase::nextImpl (this=0x5110009ff158) at ./build_docker/./src/IO/ReadBufferFromPocoSocket.cpp:107
2024-11-25 22:26:51 #16 0x00005618fd739b36 in DB::ReadBufferFromPocoSocketChunked::nextImpl (this=0x5110009ff158) at ./build_docker/./src/IO/ReadBufferFromPocoSocketChunked.cpp:103
2024-11-25 22:26:51 #17 0x00005618d9027a6b in DB::ReadBuffer::next() ()
2024-11-25 22:26:51 #18 0x00005618e967608a in DB::ReadBuffer::eof (this=0x5110009ff158) at ./src/IO/ReadBuffer.h:106
2024-11-25 22:26:51 #19 DB::varint_impl::readVarUInt<true> (x=@0x7f0a04caccd0: 0, istr=...) at ./src/IO/VarInt.h:95
2024-11-25 22:26:51 #20 0x00005618fd72c41a in DB::readVarUInt (x=@0x7f0a04caccd0: 0, istr=...) at ./src/IO/VarInt.h:114
2024-11-25 22:26:51 #21 DB::Connection::receivePacket (this=<optimized out>) at ./build_docker/./src/Client/Connection.cpp:1210
2024-11-25 22:26:51 #22 0x00005618fc97f57d in DB::RemoteInserter::onFinish (this=0x7f0a04f38960) at ./build_docker/./src/QueryPipeline/RemoteInserter.cpp:133
2024-11-25 22:26:51 #23 0x00005618fc967cd8 in DB::DistributedAsyncInsertDirectoryQueue::processFile (this=<optimized out>, file_path=..., settings_changes=...) at ./build_docker/./src/Storages/Distributed/DistributedAsyncInsertDirectoryQueue.cpp:449
2024-11-25 22:26:51 #24 0x00005618fc95e676 in DB::DistributedAsyncInsertDirectoryQueue::processFiles (this=0x5170003a1100, settings_changes=...) at ./build_docker/./src/Storages/Distributed/DistributedAsyncInsertDirectoryQueue.cpp:395
2024-11-25 22:26:51 #25 0x00005618fc95f2f2 in DB::DistributedAsyncInsertDirectoryQueue::run (this=0x5170003a1100) at ./build_docker/./src/Storages/Distributed/DistributedAsyncInsertDirectoryQueue.cpp:215
2024-11-25 22:26:51 #26 0x00005618f6572c8f in std::__1::__function::__policy_func<void ()>::operator()[abi:v15007]() const (this=0x4b) at ./contrib/llvm-project/libcxx/include/__functional/function.h:848
2024-11-25 22:26:51 #27 std::__1::function<void()>::operator() (this=0x4b) at ./contrib/llvm-project/libcxx/include/__functional/function.h:1197
2024-11-25 22:26:51 #28 DB::BackgroundSchedulePoolTaskInfo::execute (this=0x511000ba6498) at ./build_docker/./src/Core/BackgroundSchedulePool.cpp:106

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions