Fix data-races in client, session_timezone overrides and some data-races in some corner cases#82444
Conversation
|
Workflow [PR], commit [362026f] Summary: ❌
|
978a0ac to
2accd50
Compare
d45764f to
434c949
Compare
434c949 to
526becd
Compare
4499c45 to
4d969ed
Compare
I cannot find the reason, and it may be related (this changes broke some obscure places in the past), so need to take a deeper look, will fix some places in CI to get stacktraces |
Finally, after all debugging that I've added, I found it! DetailsAnd also one more data race in client reproduced in the Because we have deadlock for TSan reports! UPD: should be fixed by 362026f Details
Also can this be related? |
|
Looks better, but, data-race is not fixed yet, so we need to catch it again with proper TSan report |
…se_query Got it while running 1000 tests with 32 concurrency
The problem is that clietn calls setSettings() in the global settings, but this produce a data-race with accessing global settings all over the places [1]: - ThreadStatus::finalizePerformanceCounters() - ThreadStatus::initGlobalProfiler() [1]: ClickHouse#81893 (comment) v2: Initialize query_id in a proper context v3: Fix apply settings from config on the client v4 + v5: Fix other settings adjustments
This time I was fixing it for session_timezone, but there are other reasons to have it properly configured.
Likely due to MainThreadStatus initialized earlier, the limit is tracked more earlier, and so we need to increase it v2: increase one more time due to CI failures CI: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=82444&sha=526becdd696da6c6ffbddf8a48655251dbe40d70&name_0=PR&name_1=Fast%20test
Though not sure that it worth it, previosuly code was more generic
…Check()
This fixes possible deadlock in clickhouse-client [1]:
Thread 2 (Thread 0x7efe0c3e0640 (LWP 293871) "clickhouse-clie"):
0 0x00007efe0cb9081c in read () from /lib/x86_64-linux-gnu/libc.so.6
1 0x0000558d24646b2f in DB::ReadBufferFromFileDescriptor::readImpl (this=this@entry=0x7efe0c3d8710, to=0x7efe0c3d80b0 "", min_bytes=min_bytes@entry=1, max_bytes=536, offset=0) at ./ci/tmp/build/./src/IO/ReadBufferFromFileDescriptor.cpp:73
2 0x0000558d2464725e in DB::ReadBufferFromFileDescriptor::nextImpl (this=0x7efe0c3d8710) at ./ci/tmp/build/./src/IO/ReadBufferFromFileDescriptor.cpp:122
3 0x0000558d1658bd32 in DB::ReadBuffer::next() ()
4 0x0000558d24b304c4 in DB::ReadBuffer::eof (this=0x7efe0c3d8710) at ./src/IO/ReadBuffer.h:96
5 SignalListener::run (this=<optimized out>) at ./ci/tmp/build/./src/Common/SignalHandlers.cpp:309
6 0x0000558d371ed02e in Poco::ThreadImpl::runnableEntry (pThread=<optimized out>) at ./base/poco/Foundation/src/Thread_POSIX.cpp:335
7 0x00007efe0cb10ac3 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
8 0x00007efe0cba2850 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
Thread 1 (Thread 0x7efe0c990b80 (LWP 293870) "clickhouse-clie"):
0 0x00007efe0cb0d2c0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
1 0x00007efe0cb14002 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libc.so.6
2 0x0000558d246086d6 in pthread_mutex_lock (arg=0x558d45420940 <(anonymous namespace)::getLoggerMutex()::$_0::operator()() const::buffer>) at ./ci/tmp/build/./src/Common/ThreadFuzzer.cpp:446
3 0x0000558d3a6db189 in std::__1::__libcpp_mutex_lock[abi:ne190107](pthread_mutex_t*) (__m=0x558d45420940 <(anonymous namespace)::getLoggerMutex()::$_0::operator()() const::buffer>) at ./contrib/llvm-project/libcxx/include/__thread/support/pthread.h:95
4 std::__1::mutex::lock (this=0x558d45420940 <(anonymous namespace)::getLoggerMutex()::$_0::operator()() const::buffer>) at ./ci/tmp/build/./contrib/llvm-project/libcxx/src/mutex.cpp:29
5 0x0000558d371a9df5 in std::__1::lock_guard<std::__1::mutex>::lock_guard[abi:ne190107](std::__1::mutex&) (this=0x7ffdcc075d00, __m=...) at ./contrib/llvm-project/libcxx/include/__mutex/lock_guard.h:33
6 Poco::Logger::getShared (name=..., should_be_owned_by_shared_ptr_if_created=false) at ./ci/tmp/build/./base/poco/Foundation/src/Logger.cpp:355
7 0x0000558d245bbfb4 in getLogger<14ul> (name=...) at ./src/Common/Logger.h:41
8 MemoryTracker::debugLogBigAllocationWithoutCheck (this=<optimized out>, size=21580033) at ./ci/tmp/build/./src/Common/MemoryTracker.cpp:222
9 0x0000558d245bd0a7 in MemoryTracker::allocImpl (this=0x558d45270360 <DB::MainThreadStatus::getInstance()::thread_status+80>, size=21580033, throw_if_memory_exceeded=<optimized out>, query_tracker=0x0, _sample_probability=-1) at ./ci/tmp/build/./src/Common/MemoryTracker.cpp:400
10 0x0000558d2460aab3 in DB::ThreadStatus::flushUntrackedMemory (this=0x558d45270310 <DB::MainThreadStatus::getInstance()::thread_status>) at ./ci/tmp/build/./src/Common/ThreadStatus.cpp:245
11 0x0000558d24480b1e in CurrentMemoryTracker::allocImpl (size=64, throw_if_memory_exceeded=false) at ./ci/tmp/build/./src/Common/CurrentMemoryTracker.cpp:67
12 0x0000558d2447c4b9 in trackMemory<> (size=64, trace=...) at ./src/Common/memory.h:210
13 operator new (size=94065240705344) at ./ci/tmp/build/./src/Common/new_delete.cpp:53
14 0x0000558d371a9b67 in Poco::Logger::unsafeGet (name=..., get_shared=true) at ./ci/tmp/build/./base/poco/Foundation/src/Logger.cpp:396
15 0x0000558d371a9e02 in Poco::Logger::getShared (name=..., should_be_owned_by_shared_ptr_if_created=false) at ./ci/tmp/build/./base/poco/Foundation/src/Logger.cpp:356
16 0x0000558d245bbfb4 in getLogger<14ul> (name=...) at ./src/Common/Logger.h:41
17 MemoryTracker::debugLogBigAllocationWithoutCheck (this=<optimized out>, size=21580033) at ./ci/tmp/build/./src/Common/MemoryTracker.cpp:222
18 0x0000558d245bd0a7 in MemoryTracker::allocImpl (this=0x558d45270360 <DB::MainThreadStatus::getInstance()::thread_status+80>, size=21580033, throw_if_memory_exceeded=<optimized out>, query_tracker=0x0, _sample_probability=-1) at ./ci/tmp/build/./src/Common/MemoryTracker.cpp:400
19 0x0000558d2460aab3 in DB::ThreadStatus::flushUntrackedMemory (this=0x558d45270310 <DB::MainThreadStatus::getInstance()::thread_status>) at ./ci/tmp/build/./src/Common/ThreadStatus.cpp:245
20 0x0000558d2f8785c1 in DB::ThreadStatus::detachFromGroup (this=0x558d45270310 <DB::MainThreadStatus::getInstance()::thread_status>) at ./ci/tmp/build/./src/Interpreters/ThreadStatusExt.cpp:372
21 0x0000558d2f87be42 in DB::CurrentThread::detachFromGroupIfNotDetached () at ./ci/tmp/build/./src/Interpreters/ThreadStatusExt.cpp:716
22 DB::CurrentThread::QueryScope::~QueryScope (this=<optimized out>) at ./ci/tmp/build/./src/Interpreters/ThreadStatusExt.cpp:755
23 0x0000558d24868a98 in std::__1::__optional_destruct_base<DB::CurrentThread::QueryScope, false>::~__optional_destruct_base[abi:ne190107]() (this=0xfffffffffffffe00) at ./contrib/llvm-project/libcxx/include/optional:293
24 DB::Client::~Client (this=0x7ffdcc076ce0) at ./ci/tmp/build/./programs/client/Client.cpp:73
25 0x0000558d24883ac4 in mainEntryClickHouseClient (argc=227, argv=<optimized out>) at ./ci/tmp/build/./programs/client/Client.cpp:1140
26 0x0000558d163ae1e1 in main (argc_=<optimized out>, argv_=<optimized out>) at ./ci/tmp/build/./programs/main.cpp:338
[1]: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=82444&sha=f3fc8c98f4994d88b927cebfdc31c4bcba3d2a6f&name_0=PR&name_1=Stress%20test%20%28amd_ubsan%29
Otherwise it may lead to data-race:
WARNING: ThreadSanitizer: data race (pid=26127)
Write of size 6 at 0x7298000099e8 by main thread:
0 __tsan_memcpy <null> (clickhouse-82444-tsan+0x8be71be) (BuildId: e5eeeb8fc2a0ab65c098bb1f3db89853f2439448)
1 DB::SettingsTraits::Data::operator=(DB::SettingsTraits::Data const&) ci/tmp/build/./src/Core/Settings.cpp:6978:1 (clickhouse-82444-tsan+0x192ae4f9) (BuildId: e5eeeb8fc2a0ab65c098bb1f3db89853f2439448)
2 DB::BaseSettings<DB::SettingsTraits>::operator=(DB::BaseSettings<DB::SettingsTraits> const&) ci/tmp/build/./src/Core/BaseSettings.h:109:60 (clickhouse-82444-tsan+0x190f2bcc) (BuildId: e5eeeb8fc2a0ab65c098bb1f3db89853f2439448)
3 DB::SettingsImpl::operator=(DB::SettingsImpl const&) ci/tmp/build/./src/Core/Settings.cpp:6984:8 (clickhouse-82444-tsan+0x190f2bcc)
4 DB::Settings::operator=(DB::Settings const&) ci/tmp/build/./src/Core/Settings.cpp:7191:11 (clickhouse-82444-tsan+0x190f2bcc)
5 DB::Context::setSettings(DB::Settings const&) ci/tmp/build/./src/Interpreters/Context.cpp:2635:15 (clickhouse-82444-tsan+0x1aa94c0a) (BuildId: e5eeeb8fc2a0ab65c098bb1f3db89853f2439448)
6 DB::ClientBase::processParsedSingleQuery(std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::shared_ptr<DB::IAST>, bool&, unsigned long)::$_0::operator()() const ci/tmp/build/./src/Client/ClientBase.cpp:2184:9 (clickhouse-82444-tsan+0x1df0f4d7) (BuildId: e5eeeb8fc2a0ab65c098bb1f3db89853f2439448)
7 BasicScopeGuard<DB::ClientBase::processParsedSingleQuery(std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::shared_ptr<DB::IAST>, bool&, unsigned long)::$_0>::invoke() ci/tmp/build/./base/base/../base/scope_guard.h:101:9 (clickhouse-82444-tsan+0x1df0f4d7)
8 BasicScopeGuard<DB::ClientBase::processParsedSingleQuery(std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::shared_ptr<DB::IAST>, bool&, unsigned long)::$_0>::~BasicScopeGuard() ci/tmp/build/./base/base/../base/scope_guard.h:50:26 (clickhouse-82444-tsan+0x1df0f4d7)
9 DB::ClientBase::processParsedSingleQuery(std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::shared_ptr<DB::IAST>, bool&, unsigned long) ci/tmp/build/./src/Client/ClientBase.cpp:2249:5 (clickhouse-82444-tsan+0x1df021ca) (BuildId: e5eeeb8fc2a0ab65c098bb1f3db89853f2439448)
Previous read of size 1 at 0x7298000099e8 by thread T4:
#0 DB::SettingFieldNumber<bool>::operator bool() const ci/tmp/build/./src/Core/SettingsFields.h:40:36 (clickhouse-82444-tsan+0x1b114e5f) (BuildId: e5eeeb8fc2a0ab65c098bb1f3db89853f2439448)
ClickHouse#1 DB::ThreadStatus::finalizePerformanceCounters() ci/tmp/build/./src/Interpreters/ThreadStatusExt.cpp:547:17 (clickhouse-82444-tsan+0x1b114e5f)
ClickHouse#2 DB::ThreadStatus::detachFromGroup() ci/tmp/build/./src/Interpreters/ThreadStatusExt.cpp:375:5 (clickhouse-82444-tsan+0x1b113aec) (BuildId: e5eeeb8fc2a0ab65c098bb1f3db89853f2439448)
ClickHouse#3 DB::CurrentThread::detachFromGroupIfNotDetached() ci/tmp/build/./src/Interpreters/ThreadStatusExt.cpp:716:21 (clickhouse-82444-tsan+0x1b1119b3) (BuildId: e5eeeb8fc2a0ab65c098bb1f3db89853f2439448)
ClickHouse#4 DB::ThreadGroupSwitcher::~ThreadGroupSwitcher() ci/tmp/build/./src/Interpreters/ThreadStatusExt.cpp:261:9 (clickhouse-82444-tsan+0x1b1119b3)
ClickHouse#5 DB::ThreadPoolCallbackRunnerLocal<void, ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>, std::__1::function<void ()>>::operator()(std::__1::function<void ()>&&, Priority, std::__1::optional<unsigned long>)::'lambda'()::operator()() ci/tmp/build/./src/Common/threadPoolCallbackRunner.h:179:9 (clickhouse-82444-tsan+0x14beafde) (BuildId: e5eeeb8fc2a0ab65c098bb1f3db89853f2439448)
Thread T4 'ThreadPool' (tid=26133, running) created by thread T3 at:
...
12 DB::ThreadPoolCallbackRunnerLocal<void, ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>, std::__1::function<void ()>>::operator()(std::__1::function<void ()>&&, Priority, std::__1::optional<unsigned long>) ci/tmp/build/./src/Common/threadPoolCallbackRunner.h:188:22 (clickhouse-82444-tsan+0x14be522d) (BuildId: e5eeeb8fc2a0ab65c098bb1f3db89853f2439448)
13 DB::ParallelParsingInputFormat::scheduleParserThreadForUnitWithNumber(unsigned long) ci/tmp/build/./src/Processors/Formats/Impl/ParallelParsingInputFormat.h:287:9 (clickhouse-82444-tsan+0x1e495627) (BuildId: e5eeeb8fc2a0ab65c098bb1f3db89853f2439448)
14 DB::ParallelParsingInputFormat::segmentatorThreadFunction(std::__1::shared_ptr<DB::ThreadGroup>) ci/tmp/build/./src/Processors/Formats/Impl/ParallelParsingInputFormat.cpp:45:13 (clickhouse-82444-tsan+0x1e495627)
Refs: https://pastila.nl/?0001fdef/9a58e9d59c32d45100a481de26dccf68#T4iNrMFUnu4F2hWST5wLdQ==
d8f311f to
362026f
Compare
|
|
OK, I think I fixed all issues that I saw, @Algunenano want to take a final look? P.S. the history should be clean, I've removed all the testing commits (including merges) |
Algunenano
left a comment
There was a problem hiding this comment.
LGTM. I wonder why we have profile events counters for tasks in the customer (related to 362026f). I doubt we do anything with them
e90e5e1
Cherry pick #82444 to 25.6: Fix data-races in client, session_timezone overrides and some data-races in some corner cases
…verrides and some data-races in some corner cases
Backport #82444 to 25.6: Fix data-races in client, session_timezone overrides and some data-races in some corner cases
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix data-races in client (by not using global context) and
session_timezoneoverrides (previously in case ofsession_timezonewas set in i.e.users.xml/client options to non empty and in query context to empty, then, value fromusers.xmlwas used, while this is wrong, now query context will always have a priority over global context)The problem is that client calls
setSettings()in the global settings, but this produce a data-race with accessing global settings all over the places 1:ThreadStatus::finalizePerformanceCounters()ThreadStatus::initGlobalProfiler()But since now we do not set global context anymore, we need to use set proper query context for thread, to make
session_timeoutwork after i.e.SET session_timeout='UTC'orSELECT now() SETTINGS session_timeout='UTC'Also after client stopped using global context, one more issue pops up - incorrect handling of
session_timezone, for client it was always OK, since client always adjusted global context before, but for server, if query context overrides it to empty value, while it was non-empty in i.e.users.xml, the value from global context was used.And also various other UBs in some corner cases had been fixed that had been found during running CI on this PR
This is actually a proper fix over #81759, in addition to incomplete #82233
Fixes: #81893