[RFC] Fix jemalloc assertion due to non-monotonic CLOCK_MONOTONIC_COARSE#66439
[RFC] Fix jemalloc assertion due to non-monotonic CLOCK_MONOTONIC_COARSE#66439antonio2368 merged 2 commits intoClickHouse:masterfrom
Conversation
|
This is an automated comment for commit 669ce7c with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
1f6a0b4 to
0aa4dbd
Compare
|
Let's enable profiler back for debug so we don't have different configurations |
0aa4dbd to
65a00c7
Compare
|
If performance tests are happy then okay |
|
@azat Let's submit a fix to the Linux man page. It says that
|
|
I haven't finished yet:
By this I meant:
|
|
Depends on #66460 |
Recently one tricky assertion of jemalloc had been discovered [1]:
Failed assertion: "nstime_compare(&decay->epoch, new_time) <= 0"
[1]: ClickHouse#66193
And as it turns out it is really possible for CLOCK_MONOTONIC_COARSE to
go backwards, in a nutshell it can be done with ADJ_FREQUENCY, you can
find example here [2]. And I can't trigger this issue for non-coarse
clocks.
[2]: https://gist.github.com/azat/7ea7f50ed75591b1af2d675a240ea94c?permalink_comment_id=5119222#gistcomment-5119222
But, jemalloc do not call clock_gettime() that frequently (I've verified
it), so it can use non-coarse version - CLOCK_MONOTONIC
I've also measured the latency of CLOCK_MONOTONIC and
CLOCK_MONOTONIC_COARSE, and it is 20ns vs 4ns per call [3], so make this
change affect performance you need really frequently calls of
clock_gettime.
[3]: https://gist.github.com/azat/622fa1f9a5d8e7d546ee9d294501961d?permalink_comment_id=5119245#gistcomment-5119245
Interesting, that this bug started to appears only after jemalloc heap
profiler had been enabled by default [4], no clue why (I would believe
more in a more frequent calls to clock_adjtime(ADJ_FREQUENCY), but I
can't verify this)
[4]: ClickHouse#65702
To be continued...
Fixes: ClickHouse#66193
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Since the issue with jemalloc assertion is clear, let's revert that workaround Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
42ff7c4 to
669ce7c
Compare
|
|
@azat, your pull request is mistakenly classified as a bugfix, but it does not fix any wrong behavior in production ClickHouse builds. It does not even have tests to prove that. Our CI pointed to that. Please change the classification. |
|
I'm going to merge as test failures seem unrelated and performance tests are okay |
Changelog category (leave one):
Fix jemalloc assertion due to non-monotonic CLOCK_MONOTONIC_COARSE
Recently one tricky assertion of jemalloc had been discovered 1:
And as it turns out it is really possible for CLOCK_MONOTONIC_COARSE to go backwards, in a nutshell it can be done with ADJ_FREQUENCY, you can find example here 2. And I can't trigger this issue for non-coarse clocks.
But, jemalloc do not call clock_gettime() that frequently (I've verified it), so it can use non-coarse version - CLOCK_MONOTONIC
I've also measured the latency of CLOCK_MONOTONIC and CLOCK_MONOTONIC_COARSE, and it is 20ns vs 4ns per call 3, so make this change affect performance you need really frequently calls of clock_gettime.
Interesting, that this bug started to appears only after jemalloc heap profiler had been enabled by default 4, no clue why (I would believe more in a more frequent calls to clock_adjtime(ADJ_FREQUENCY), but I can't verify this)
To be continued...
Fixes: #66193 (cc @antonio2368 )