Skip to content

ThreadSanitizer: deadlock in signal handler #1540

@tavplubix

Description

@tavplubix

Hi, seems like our stress tests have found a deadlock in TSan while testing ClickHouse. The stacktrace is

Thread 238 (Thread 0x7fa07287e700 (LWP 925)):
#0  0x000000000b2949aa in __sanitizer::FutexWait(__sanitizer::atomic_uint32_t*, unsigned int) ()
#1  0x000000000b295a4a in __sanitizer::Semaphore::Wait() ()
#2  0x000000000b31a23f in __tsan::SlotLock(__tsan::ThreadState*) ()
#3  0x000000000b32a2c3 in __tsan::Acquire(__tsan::ThreadState*, unsigned long, unsigned long) ()
#4  0x000000000b2bd7b8 in __tsan::CallUserSignalHandler(__tsan::ThreadState*, bool, bool, int, __sanitizer::__sanitizer_siginfo*, void*) ()
#5  0x000000000b2bdcd4 in sighandler(int, __sanitizer::__sanitizer_siginfo*, void*) ()
#6  <signal handler called>
#7  0x000000000b33205e in __tsan::MetaMap::FreeRange(__tsan::Processor*, unsigned long, unsigned long, bool) ()
#8  0x000000000b331d0e in __tsan::MetaMap::FreeBlock(__tsan::Processor*, unsigned long, bool) ()
#9  0x000000000b31334f in __tsan::OnUserFree(__tsan::ThreadState*, unsigned long, unsigned long, bool) ()
#10 0x000000000b313172 in __tsan::user_free(__tsan::ThreadState*, unsigned long, void*, bool) ()
#11 0x000000000b2b5df5 in free ()
#12 0x00007fa19dd67a75 in _dl_deallocate_tls () from /lib64/ld-linux-x86-64.so.2
#13 0x00007fa19dd03242 in free_stacks () from /lib/x86_64-linux-gnu/libpthread.so.0
#14 0x00007fa19dd04522 in __free_tcb () from /lib/x86_64-linux-gnu/libpthread.so.0
#15 0x00007fa19dd05b29 in __pthread_clockjoin_ex () from /lib/x86_64-linux-gnu/libpthread.so.0
#16 0x000000000b2b72de in pthread_join ()
#17 0x000000001edc0d66 in Poco::ThreadImpl::joinImpl (this=this@entry=0x7b5400457938, milliseconds=milliseconds@entry=10000) at ../contrib/poco/Foundation/src/Thread_POSIX.cpp:247
#18 0x000000001edc1dc1 in Poco::Thread::tryJoin (this=0x7b5400457938, milliseconds=10000) at ../contrib/poco/Foundation/src/Thread.cpp:153
#19 0x000000001edc3b93 in Poco::PooledThread::release (this=0x7b5400457900) at ../contrib/poco/Foundation/src/ThreadPool.cpp:179
#20 0x000000001edc53d1 in Poco::ThreadPool::housekeep (this=this@entry=0x7ffc8fc36030) at ../contrib/poco/Foundation/src/ThreadPool.cpp:435
#21 0x000000001edc5aeb in Poco::ThreadPool::getThread (this=this@entry=0x7ffc8fc36030) at ../contrib/poco/Foundation/src/ThreadPool.cpp:446
#22 0x000000001edc5fc8 in Poco::ThreadPool::startWithPriority (this=0x7ffc8fc36030, priority=Poco::Thread::PRIO_NORMAL, target=..., name=...) at ../contrib/poco/Foundation/src/ThreadPool.cpp:365
#23 0x000000001eb7898f in Poco::Net::TCPServerDispatcher::enqueue (this=0x7b4000001600, socket=...) at ../contrib/poco/Net/src/TCPServerDispatcher.cpp:152
#24 0x000000001eb774a8 in Poco::Net::TCPServer::run (this=<optimized out>) at ../contrib/poco/Net/src/TCPServer.cpp:148
#25 0x000000001edc20f0 in Poco::(anonymous namespace)::RunnableHolder::run (this=<optimized out>) at ../contrib/poco/Foundation/src/Thread.cpp:55
#26 0x000000001edc082c in Poco::ThreadImpl::runnableEntry (pThread=0x7b4400019168) at ../contrib/poco/Foundation/src/Thread_POSIX.cpp:345
#27 0x000000000b2b6f19 in __tsan_thread_start_func ()
#28 0x00007fa19dd04609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#29 0x00007fa19dc29133 in clone () from /lib/x86_64-linux-gnu/libc.so.6

I looked through llvm code, and looks like this thread is trying to lock the same mutex second time:
in __tsan::CallUserSignalHandler: https://github.com/llvm/llvm-project/blob/12e137ab24dae51553433e00ff96e28a14d5b1f5/compiler-rt/lib/tsan/rtl/tsan_mman.cpp#L272
and then in __tsan::Acquire: https://github.com/llvm/llvm-project/blob/f831d6fc800ccf22c1c09888fce3e3c8ebc2c992/compiler-rt/lib/tsan/rtl/tsan_rtl_mutex.cpp#L448
If I understand correctly, SlotLocker locks thr->slot->mtx which is __sanitizer::Mutex and it's not recursive.

The binary was built with Clang 14.0.5 (build log).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions