Conversation
|
Can you rebase on top of #48600? |
c2c1855 to
1b22c5d
Compare
|
Is this intentionally on top of #49644? |
cf8d9fb to
a9ad110
Compare
This comment was marked as outdated.
This comment was marked as outdated.
vchuravy
left a comment
There was a problem hiding this comment.
Can you disentangle this from #49644 so that we can study
#48969 (comment) independently?
|
Should be independent of #49644 now. |
|
Latest commit should allow part of sweeping of object pools to run concurrently with mutator threads independently of whether we have GC threads or not (e.g. a program running with The cost if, of course, more contention on |
|
I think the solution is to do away with that perm_lock. It doesn't seem too complicated to do that and switch to doing Compare and Swap. |
799f3fd to
ca57083
Compare
74b4c6f to
e060962
Compare
Both are good properties, so if there's (necessarily) a trade-off, can it still me merged with it off by default, and an ENV var to enable for low GC pauses? While you want to avoid allocations, and the GC entirely, for real-time, it's hard to do fully, and shorter pauses very valuable for soft real-time. It's just a question what to call this ENV var, CONCURRENT_SWEEP_GC (or e.g. SOFT_REAL-TIME_GC)? |
vchuravy
left a comment
There was a problem hiding this comment.
This PR brings real and tangible benefits for multi-threaded code that is allocating, by significantly shortening the STW phase, therefore improving scalabity (Amdahl's law says hi).
I believe we should add an environment and runtime flag for this feature.
On systems vulnerable to Meltdown&Spector KPTI can cause iTLB flushes. With concurrent page-sweeping instead of paying this cost "once"
we will concurrently invalidate the iTLB leading to runtime performance loss.
In particular for the GCBenchmark tree_multable I saw an increase in cpu-time being spent in __madvise and cpu-time being spent in asm_sysvec_call_function on the threads that are not running concurrent GC.
@kpamnany also voiced discomfort with the system being oversubscribed.
I also found it counter-intuitive that --gcthreads=1 would disable concurrent page sweeping.
In the long-term open questions for me are:
- Could we implement this with the tasking system, e.g. schedule a task that will some cleanup work?
- We could try out
io_uringfor batching the madvise calls, but that would be significant work. - If concurrent sweeping is disabled, we could run this after the STW phase ended, but before the finalizers. This would alleviate some of @kpamnany oversubscription concerns, while still moving the cost out of the STW phase.
|
To keep things consistent with Open to suggestions on that. |
|
Bump. |
Extends #48600 by making sweeping of object pools concurrent.