Run GC on multiple threads#48600
Conversation
abe4675 to
145db71
Compare
|
I assume this depends on #48123 for good performance on AMD? |
|
The number of GC threads can be specified through It would be nice to run CI with multiple GC threads (worst case, I think we can chose a higher default count of GC threads for testing purposes). |
|
Same for PkgEval. |
You can do: |
5489db2 to
9698e23
Compare
The |
33137cf to
88ebe01
Compare
|
@nanosoldier |
|
Your package evaluation job has completed - possible new issues were detected. |
dc2c305 to
40c9618
Compare
3624656 to
b5b5791
Compare
|
Will merge this tomorrow if there are no further comments. |
Using a work-stealing queue after Chase and Lev, optimized for weak memory models by Le et al. Default number of GC threads is half the number of compute threads. Co-authored-by: Gabriel Baraldi <baraldigabriel@gmail.com> Co-authored-by: Valentin Churavy <v.churavy@gmail.com>
|
Just checking if #49545 (comment) should be understood first? |
|
I don't think so. It appears to be totally unrelated. |
Summary
This PR parallelizes the GC mark-loop by introducing GC threads into the Julia runtime and by implementing work-stealing to dynamically balance the amount of work each thread performs in the GC mark-loop.
Implementation
Following #47292, each thread running the GC mark-loop manages two work-queues: one queue stores pointers to Julia objects that need to be scanned, and another queue (chunk queue) stores iterator states corresponding to suffixes of large arrays that need to be scanned.
Both the pointer and chunk queues are lock-free and are based on the work of Chase-Lev and Le et. al (see papers referenced in
work-stealing-queue.h).Results
These are speedups in mark-time for a tweaked (JuliaCI/GCBenchmarks#61) version of the
rb_tree.jlbenchmark fromGCBenchmarks. This representative benchmark uses a single mutator thread (we scale the number of GC threads in the plots below). For more benchmarks, see @vchuravy's comment below.Machine
Speedups