Fix possible data-race StorageKafka with statistics_interval_ms>0#66311
Fix possible data-race StorageKafka with statistics_interval_ms>0#66311alexey-milovidov merged 1 commit intoClickHouse:masterfrom
Conversation
|
This is an automated comment for commit 301ac5d with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
aac1db2 to
076d2bb
Compare
|
@azat, we have fixed a bunch of tests, could you please merge with master? |
Likely part of the problem is gdb: And another part is CI wrappers - #66036 (comment) |
|
"Fixed" in ba176a9 |
076d2bb to
d8865a6
Compare
|
@azat, there is MSan failure in GRPC that has to be fixed. |
|
It has been fixed in #66509 |
|
@azat, merge with the master branch to incorporate these fixes into this PR. |
d8865a6 to
4adca97
Compare
4adca97 to
316dd79
Compare
|
|
Unfortunately, it blocks merge. |
|
Thank you for the fix, we will try to merge this dependent PR asap. |
316dd79 to
c2e8f3a
Compare
|
@azat There was a mistake in the CI infrastructure related to retries. |
|
That is likely fixed in #67738. OK. Rebased. |
c2e8f3a to
1e37ae2
Compare
|
freebsd build got broken - #68014 |
The problem here is that ignorelist did not work by some reason, if I
will look at the ignored functions it should not contain any TSan
interseption code, while it does:
$ lldb-13 clickhouse
(lldb) target create "clickhouse"
disas -n rd_avg_rollover
Current executable set to '/home/azat/ch/tmp/tsan-test/clickhouse' (x86_64).
(lldb) disas -n rd_avg_rollover
clickhouse`rd_kafka_stats_emit_avg:
clickhouse[0x1cbf84a7] <+39>: leaq 0x30(%r15), %r12
clickhouse[0x1cbf84ab] <+43>: movq %r12, %rdi
clickhouse[0x1cbf84ae] <+46>: callq 0x1ccdad40 ; rdk_thread_mutex_lock at tinycthread.c:111
clickhouse[0x1cbf84b3] <+51>: leaq 0x58(%r15), %rdi
clickhouse[0x1cbf84b7] <+55>: callq 0x71b5390 ; __tsan_read4
clickhouse[0x1cbf84bc] <+60>: cmpl $0x0, 0x58(%r15)
clickhouse[0x1cbf84c1] <+65>: je 0x1cbf8595 ; <+277> [inlined] rd_avg_rollover + 238 at rdavg.h
clickhouse[0x1cbf84c7] <+71>: leaq -0xc8(%rbp), %rdi
clickhouse[0x1cbf84ce] <+78>: xorl %esi, %esi
clickhouse[0x1cbf84d0] <+80>: callq 0x1ccdac80 ; rdk_thread_mutex_init at tinycthread.c:62
clickhouse[0x1cbf84d5] <+85>: leaq 0x5c(%r15), %rdi
clickhouse[0x1cbf84d9] <+89>: callq 0x71b5390 ; __tsan_read4
(lldb) disas -n rd_avg_calc
clickhouse`rd_kafka_broker_ops_io_serve:
clickhouse[0x1cbdf086] <+1990>: leaq 0x5a4(%rbx), %rdi
clickhouse[0x1cbdf08d] <+1997>: callq 0x71b5390 ; __tsan_read4
clickhouse[0x1cbdf092] <+2002>: cmpl $0x0, 0x5a4(%rbx)
clickhouse[0x1cbdf099] <+2009>: je 0x1cbdf12b ; <+2155> [inlined] rd_kafka_broker_timeout_scan + 719 at rdkafka_broker.c
I guess the reason is that they had been inlined
So now rd_avg_calc() guarded with a mutex.
Refs: ClickHouse/librdkafka#11
Fixes: ClickHouse#60939
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
1e37ae2 to
301ac5d
Compare
|
I will manually set the mergeable check and merge this PR. PS: here are the two other failures in the past 10 days. |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix possible data-race StorageKafka with statistics_interval_ms>0
The problem here is that ignorelist did not work by some reason, if I will look at the ignored functions it should not contain any TSan interseption code, while it does:
I guess the reason is that they had been inlined
So now rd_avg_calc() guarded with a mutex.
Refs: ClickHouse/librdkafka#11
Fixes: #60939
Follow-up for: #50999
Cc: @antaljanosbenjamin