Conversation
|
This is an automated comment for commit f1337e8 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
|
@antonio2368 And we can enable it dynamiclly during running. |
|
@zhanglistar what do you mean? |
|
No performance impact! |
|
@azat I enabled I just now noticed the error log advising against enabling it. Does it still hold? |
Kind of, it simply does not make sense The question is, why it deadlocks, I've tried to disable it but vanila server with a simple |
|
+1 for @zhanglistar comment Why it has been enabled by default, instead of simply |
|
The same thing is described here https://clickhouse.com/docs/en/operations/allocation-profiling#sampling-allocations-and-flushing-heap-profiles but the idea is to simplify and make it more accessible to use because right now it requires a restart to be able to apply such change. |
Recently one tricky assertion of jemalloc had been discovered [1]:
Failed assertion: "nstime_compare(&decay->epoch, new_time) <= 0"
[1]: ClickHouse#66193
And as it turns out it is really possible for CLOCK_MONOTONIC_COARSE to
go backwards, in a nutshell it can be done with ADJ_FREQUENCY, you can
find example here [2]. And I can't trigger this issue for non-coarse
clocks.
[2]: https://gist.github.com/azat/7ea7f50ed75591b1af2d675a240ea94c?permalink_comment_id=5119222#gistcomment-5119222
But, jemalloc do not call clock_gettime() that frequently (I've verified
it), so it can use non-coarse version - CLOCK_MONOTONIC
I've also measured the latency of CLOCK_MONOTONIC and
CLOCK_MONOTONIC_COARSE, and it is 20ns vs 4ns per call [3], so make this
change affect performance you need really frequently calls of
clock_gettime.
[3]: https://gist.github.com/azat/622fa1f9a5d8e7d546ee9d294501961d?permalink_comment_id=5119245#gistcomment-5119245
Interesting, that this bug started to appears only after jemalloc heap
profiler had been enabled by default [4], no clue why (I would believe
more in a more frequent calls to clock_adjtime(ADJ_FREQUENCY), but I
can't verify this)
[4]: ClickHouse#65702
To be continued...
Fixes: ClickHouse#66193
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Changelog category (leave one):
Let's try building jemalloc with profiler, and have it inactive by default. The overhead should be almost non-existent.
This will allow us to activate the profiler using SQL command without the need to change any env variables or restart.
Documentation entry for user-facing changes
CI Settings (Only check the boxes if you know what you are doing):