-
Notifications
You must be signed in to change notification settings - Fork 1.1k
ThreadSanitizer: deadlock in signal handler #1540
Copy link
Copy link
Closed
Description
Hi, seems like our stress tests have found a deadlock in TSan while testing ClickHouse. The stacktrace is
Thread 238 (Thread 0x7fa07287e700 (LWP 925)):
#0 0x000000000b2949aa in __sanitizer::FutexWait(__sanitizer::atomic_uint32_t*, unsigned int) ()
#1 0x000000000b295a4a in __sanitizer::Semaphore::Wait() ()
#2 0x000000000b31a23f in __tsan::SlotLock(__tsan::ThreadState*) ()
#3 0x000000000b32a2c3 in __tsan::Acquire(__tsan::ThreadState*, unsigned long, unsigned long) ()
#4 0x000000000b2bd7b8 in __tsan::CallUserSignalHandler(__tsan::ThreadState*, bool, bool, int, __sanitizer::__sanitizer_siginfo*, void*) ()
#5 0x000000000b2bdcd4 in sighandler(int, __sanitizer::__sanitizer_siginfo*, void*) ()
#6 <signal handler called>
#7 0x000000000b33205e in __tsan::MetaMap::FreeRange(__tsan::Processor*, unsigned long, unsigned long, bool) ()
#8 0x000000000b331d0e in __tsan::MetaMap::FreeBlock(__tsan::Processor*, unsigned long, bool) ()
#9 0x000000000b31334f in __tsan::OnUserFree(__tsan::ThreadState*, unsigned long, unsigned long, bool) ()
#10 0x000000000b313172 in __tsan::user_free(__tsan::ThreadState*, unsigned long, void*, bool) ()
#11 0x000000000b2b5df5 in free ()
#12 0x00007fa19dd67a75 in _dl_deallocate_tls () from /lib64/ld-linux-x86-64.so.2
#13 0x00007fa19dd03242 in free_stacks () from /lib/x86_64-linux-gnu/libpthread.so.0
#14 0x00007fa19dd04522 in __free_tcb () from /lib/x86_64-linux-gnu/libpthread.so.0
#15 0x00007fa19dd05b29 in __pthread_clockjoin_ex () from /lib/x86_64-linux-gnu/libpthread.so.0
#16 0x000000000b2b72de in pthread_join ()
#17 0x000000001edc0d66 in Poco::ThreadImpl::joinImpl (this=this@entry=0x7b5400457938, milliseconds=milliseconds@entry=10000) at ../contrib/poco/Foundation/src/Thread_POSIX.cpp:247
#18 0x000000001edc1dc1 in Poco::Thread::tryJoin (this=0x7b5400457938, milliseconds=10000) at ../contrib/poco/Foundation/src/Thread.cpp:153
#19 0x000000001edc3b93 in Poco::PooledThread::release (this=0x7b5400457900) at ../contrib/poco/Foundation/src/ThreadPool.cpp:179
#20 0x000000001edc53d1 in Poco::ThreadPool::housekeep (this=this@entry=0x7ffc8fc36030) at ../contrib/poco/Foundation/src/ThreadPool.cpp:435
#21 0x000000001edc5aeb in Poco::ThreadPool::getThread (this=this@entry=0x7ffc8fc36030) at ../contrib/poco/Foundation/src/ThreadPool.cpp:446
#22 0x000000001edc5fc8 in Poco::ThreadPool::startWithPriority (this=0x7ffc8fc36030, priority=Poco::Thread::PRIO_NORMAL, target=..., name=...) at ../contrib/poco/Foundation/src/ThreadPool.cpp:365
#23 0x000000001eb7898f in Poco::Net::TCPServerDispatcher::enqueue (this=0x7b4000001600, socket=...) at ../contrib/poco/Net/src/TCPServerDispatcher.cpp:152
#24 0x000000001eb774a8 in Poco::Net::TCPServer::run (this=<optimized out>) at ../contrib/poco/Net/src/TCPServer.cpp:148
#25 0x000000001edc20f0 in Poco::(anonymous namespace)::RunnableHolder::run (this=<optimized out>) at ../contrib/poco/Foundation/src/Thread.cpp:55
#26 0x000000001edc082c in Poco::ThreadImpl::runnableEntry (pThread=0x7b4400019168) at ../contrib/poco/Foundation/src/Thread_POSIX.cpp:345
#27 0x000000000b2b6f19 in __tsan_thread_start_func ()
#28 0x00007fa19dd04609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#29 0x00007fa19dc29133 in clone () from /lib/x86_64-linux-gnu/libc.so.6
I looked through llvm code, and looks like this thread is trying to lock the same mutex second time:
in __tsan::CallUserSignalHandler: https://github.com/llvm/llvm-project/blob/12e137ab24dae51553433e00ff96e28a14d5b1f5/compiler-rt/lib/tsan/rtl/tsan_mman.cpp#L272
and then in __tsan::Acquire: https://github.com/llvm/llvm-project/blob/f831d6fc800ccf22c1c09888fce3e3c8ebc2c992/compiler-rt/lib/tsan/rtl/tsan_rtl_mutex.cpp#L448
If I understand correctly, SlotLocker locks thr->slot->mtx which is __sanitizer::Mutex and it's not recursive.
The binary was built with Clang 14.0.5 (build log).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels