Skip to content

Concurrent page sweeping#48969

Merged
d-netto merged 1 commit intoJuliaLang:masterfrom
d-netto:dcn/psweep
Jun 28, 2023
Merged

Concurrent page sweeping#48969
d-netto merged 1 commit intoJuliaLang:masterfrom
d-netto:dcn/psweep

Conversation

@d-netto
Copy link
Copy Markdown
Member

@d-netto d-netto commented Mar 11, 2023

Extends #48600 by making sweeping of object pools concurrent.

@d-netto d-netto added the GC Garbage collector label Mar 11, 2023
@d-netto d-netto marked this pull request as draft March 11, 2023 23:38
@vchuravy
Copy link
Copy Markdown
Member

Can you rebase on top of #48600?

@d-netto d-netto force-pushed the dcn/psweep branch 4 times, most recently from c2c1855 to 1b22c5d Compare May 9, 2023 02:53
@d-netto d-netto marked this pull request as ready for review May 10, 2023 19:04
@d-netto d-netto requested review from gbaraldi, vchuravy and vtjnash May 10, 2023 19:04
@d-netto d-netto changed the title WIP: Parallel sweeping Parallel/concurrent sweeping May 10, 2023
@oscardssmith
Copy link
Copy Markdown
Member

Is this intentionally on top of #49644?

@d-netto d-netto force-pushed the dcn/psweep branch 9 times, most recently from cf8d9fb to a9ad110 Compare May 14, 2023 15:33
@chriselrod

This comment was marked as outdated.

Copy link
Copy Markdown
Member

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you disentangle this from #49644 so that we can study
#48969 (comment) independently?

@d-netto
Copy link
Copy Markdown
Member Author

d-netto commented May 15, 2023

Should be independent of #49644 now.

@d-netto
Copy link
Copy Markdown
Member Author

d-netto commented May 16, 2023

Latest commit should allow part of sweeping of object pools to run concurrently with mutator threads independently of whether we have GC threads or not (e.g. a program running with --threads=4 --gcthreads=1 could in theory benefit from it).

The cost if, of course, more contention on gc_perm_lock. This needs more careful analysis to confirm that we are not making GC pauses shorter at the cost of throughput.

@gbaraldi
Copy link
Copy Markdown
Member

gbaraldi commented Jun 6, 2023

I think the solution is to do away with that perm_lock. It doesn't seem too complicated to do that and switch to doing Compare and Swap.

@vchuravy vchuravy self-requested a review June 6, 2023 15:27
@d-netto d-netto force-pushed the dcn/psweep branch 8 times, most recently from 799f3fd to ca57083 Compare June 23, 2023 15:02
@d-netto d-netto force-pushed the dcn/psweep branch 7 times, most recently from 74b4c6f to e060962 Compare June 24, 2023 06:20
@PallHaraldsson
Copy link
Copy Markdown
Contributor

it seems like this PR can be detrimental to throughput even though it reduces GC pauses in a few cases.

Both are good properties, so if there's (necessarily) a trade-off, can it still me merged with it off by default, and an ENV var to enable for low GC pauses? While you want to avoid allocations, and the GC entirely, for real-time, it's hard to do fully, and shorter pauses very valuable for soft real-time. It's just a question what to call this ENV var, CONCURRENT_SWEEP_GC (or e.g. SOFT_REAL-TIME_GC)?

Copy link
Copy Markdown
Member

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR brings real and tangible benefits for multi-threaded code that is allocating, by significantly shortening the STW phase, therefore improving scalabity (Amdahl's law says hi).

I believe we should add an environment and runtime flag for this feature.

On systems vulnerable to Meltdown&Spector KPTI can cause iTLB flushes. With concurrent page-sweeping instead of paying this cost "once"
we will concurrently invalidate the iTLB leading to runtime performance loss.

In particular for the GCBenchmark tree_multable I saw an increase in cpu-time being spent in __madvise and cpu-time being spent in asm_sysvec_call_function on the threads that are not running concurrent GC.

@kpamnany also voiced discomfort with the system being oversubscribed.

I also found it counter-intuitive that --gcthreads=1 would disable concurrent page sweeping.

In the long-term open questions for me are:

  • Could we implement this with the tasking system, e.g. schedule a task that will some cleanup work?
  • We could try out io_uring for batching the madvise calls, but that would be significant work.
  • If concurrent sweeping is disabled, we could run this after the STW phase ended, but before the finalizers. This would alleviate some of @kpamnany oversubscription concerns, while still moving the cost out of the STW phase.

@d-netto
Copy link
Copy Markdown
Member Author

d-netto commented Jun 25, 2023

To keep things consistent with --threads, I was thinking about something like --gcthreads=X,Y, with X being the number of threads that may run parallel marking and Y being the number of threads that may run concurrent page sweeping (0 or 1, chosen to be 0 by default).

Open to suggestions on that.

@d-netto
Copy link
Copy Markdown
Member Author

d-netto commented Jun 27, 2023

Bump.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

GC Garbage collector performance Must go faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants