Decouple major and minor2 by kayceesrk · Pull Request #8 · kayceesrk/ocaml

kayceesrk · 2022-07-01T06:10:03Z

Rework of #7. This now accounts for triggering major slice on several domains.

Previously:

Currently:

ctk21

So I think this works, but it is quite a complex area given all the possible interactions: domain lifecycle, backup thread servicing, etc. I wonder how best to test it - that things work in the testsuite is good. I wonder if running it through sandmark to see that the RSS and total number of major cycles is not meaningfully changed?

I can see that triggering a best effort slice without using the full STW mechanism is desirable to avoid synchronisation in advbance_global_major_slice_epoch; on balance I don't think the added use all_domains_lock in this PR is too much added complexity.

ctk21 · 2022-08-17T17:15:08Z

runtime/domain.c

+          global one [caml_major_slice_epoch]. */
+      for(int i = 0; i < stw_domains.participating_domains; i++) {
+        dom_internal * di = stw_domains.domains[i];
+        stw_request.participating[i] = di->state;


I don't think you should set this - it isn't needed and also it might clobber stw_request which isn't entirely protected by all_domains_lock in caml_try_run_on_all_domains_with_spin_work.

Remove this line?

Good catch. Looks like a copy-paste error. I'll remove this line.

I'll also remove the assertion on the next line:

CAMLassert(!d->interruptor.interrupt_pending);

It is unclear to me why this is expected to hold.

gadmm · 2022-10-03T14:56:56Z

runtime/domain.c

+        /* This domain is the first one to use up half of its minor heap arena
+           in this minor cycle. Trigger major slice on other domains. */
+        if (caml_plat_try_lock(&all_domains_lock)) {
+          /* Note that this interrupt is best-effort. If we get the lock,
+             then interrupt all the domains. If not, either some other domain
+             is calling for a stop-the-world section interrupting all the
+             domains, or a domain is being created or terminated. All of these
+             actions also try to lock [all_domains_lock] mutex, and the above
+             lock acquisition may fail.
+
+             If we don't get the lock, we don't interrupt other domains. This
+             is acceptable since it does not affect safety but only liveness --
+             the speed of the major gc. The other domains may themselves fill
+             half of their minor heap triggering a major slice, or do it right
+             after their next minor GC when they observe that their
+             domain-local [Caml_state->major_slice_epoch] is less than the
+             global one [caml_major_slice_epoch]. */


IIUC there is some involved logic to make sure that other domains work enough for the major GC before being forced into a minor collection.

Here is another possible approach (this is for the record, as I had the chance to propose this to KC verbally already).

Every time a minor collection is needed while the minor heap is half-empty, do a stop&copy collection into the free half of the minor heap. This reduces the amount of false promotions and (what is important here) avoids increasing the work debt for the major GC.

Every time a minor collection is needed before the major slice is done, simply transfer the work debt to the next minor cycle.

Note: the major slice happens when the minor-heap is half-full, so the two trigger conditions coincide. Therefore, the major slice can be delayed, but the work debt never accumulates.

In this approach, any additional work needed to keep up with phase changes (see ocaml#11589) should be accounted for separately from the slice-at-middle rendez-vous.

Hi @gadmm, I'm trying to understand how the proposal to perform stop & copy collection in the minor heap is solving the current problem. I'd think that is an orthogonal improvement. Perhaps your intention is to document future GC improvements. Not sure what the best place for this would be. Perhaps ocaml/RFCs?

There are some (not insurmountable) challenges with implementing this copying collection for minor heap given that we may have pointers between the minor heap arenas of different domains. A parallel copying collector will have to synchronize to copy the objects to the to_space. While I've implemented simple Cheney copying collectors before, I don't know the state of the art in parallel copying collections.

Coming back to the lines of code under discussion, in practice, I would think that the best-effort interruption above would work well. If the lock is not obtained, the worst that can happen is that the major slice is pushed to after the next minor collection on the other domains.

Your current strategy is to trigger slices on all domains as soon as one domain reaches half of the minor heap. This is a bit heuristic, for instance due to scheduling hazards mutators on other domains could very well barely be restarting after the minor collection, in which case the splitting will not be effective in reducing latency. It might very-well work in practice, but I was pointing out that there is no need for a heuristic approach, since with the semispace copying approach, the logic for when to do a slice is independent of the work of other domains.

I would not make this suggestion here if I thought it was as complex as you seem to think. I will be happy to discuss details elsewhere but ocaml/RFCs is probably not a good place for this (this concerns the implementation rather than a language change, and also some insiders discouraged in private its use so I do not know what to think about it).

I am not worried about the try_lock; have you considered accessing the interruptor via all_domains instead of stw_participants to avoid having to lock at all? (see this approach here: ocaml@dcb6520#diff-67115925103982a8ebeb085cfab5ef31a182c9a442bc51e053934364d3750dafR1507-R1517)

I would not make this suggestion here if I thought it was as complex as you seem to think.

As with anything in the GC implementation, my experience is that, for anything that is not trivial, there is typically a significant amount of implementation, debugging and tuning effort required. My approach has been to hack up a prototype and only test it on the best-case input for that design. If that is promising, then start implementing that feature for real.

I will be happy to discuss details elsewhere but ocaml/RFCs is probably not a good place for this (this concerns the implementation rather than a language change, and also some insiders discouraged in private its use so I do not know what to think about it).

I shall bring this up again with the caml-devel.

have you considered accessing the interruptor via all_domains instead of stw_participants to avoid having to lock at all? (see this approach here: ocaml@dcb6520#diff-67115925103982a8ebeb085cfab5ef31a182c9a442bc51e053934364d3750dafR1507-R1517)

Is this safe? I am worried about races wrt other operations such as spawning and terminating domains. It felt uneasy to muck with their interruption flag and young limit. If this is safe, it would be nice to write down the guarantee for such a global interruption as code comments along with a short account of why. What do you think?

Given that triggering slices on other domains happens once every minor GC cycle, the current approach of taking the lock for interruption should not be expensive.

The PR I pointed to (ocaml#11307) has to make a minor change to synchronize the read of interruptor_word with domain initialization. Modulo this change, this is safe because interruptor_word is set once and for all. I think ocaml#11307 documents enough the needed synchronisation but let me know if it is unclear (let's discuss it over there). Once ocaml#11307 is accepted, the function caml_interrupt_all_for_signal can be freely re-used for this purpose.

I don't have objections to the current heuristic for triggering slices nor to taking the lock, this is taking a good shot at a first implementation of the feature.

kayceesrk · 2022-10-05T07:26:15Z

I wonder if running it through sandmark to see that the RSS and total number of major cycles is not meaningfully changed?

This is indeed something to be done. We now have instructions retired in Sandmark, which was quite useful to debug a few regressions.

gadmm · 2022-10-05T13:51:23Z

runtime/domain.c

+  }
+
+  if (d->major_slice_epoch < atomic_load (&caml_major_slice_epoch)) {
+    d->requested_major_slice = 1;


The condition d->major_slice_epoch < atomic_load (&caml_major_slice_epoch) playing the role of the payload of an inter-domain interruption, it should be added among the conditions checked at the end of caml_reset_young_limit. Otherwise, a race might lead to the interrupt being dropped.

Like this 0831bc3?

Yes, this looks right. As a cosmetic comment, I think it would be clearer to read by giving it name.

kayceesrk · 2022-12-06T09:57:34Z

Closing this now that ocaml#11750 is open on ocaml/ocaml.

…l#13294) The toplevel printer detects cycles by keeping a hashtable of values that it has already traversed. However, some OCaml runtime types (at least bigarrays) may be partially uninitialized, and hashing them at arbitrary program points may read uninitialized memory. In particular, the OCaml testsuite fails when running with a memory-sanitizer enabled, as bigarray printing results in reads to uninitialized memory: ``` ==133712==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x4e6d11 in caml_ba_hash /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45 #1 0x52474a in caml_hash /var/home/edwin/git/ocaml/runtime/hash.c:251:35 #2 0x599ebf in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1065:14 #3 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9 #4 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3 #5 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) #6 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) #7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37) Uninitialized value was created by a heap allocation #0 0x47d306 in malloc (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x47d306) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37) #1 0x4e7960 in caml_ba_alloc /var/home/edwin/git/ocaml/runtime/bigarray.c:246:12 #2 0x4e801f in caml_ba_create /var/home/edwin/git/ocaml/runtime/bigarray.c:673:10 #3 0x59b8fc in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14 #4 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9 #5 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3 #6 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) #7 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) #8 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37) SUMMARY: MemorySanitizer: use-of-uninitialized-value /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45 in caml_ba_hash ``` The only use of hashing in genprintval is to avoid cycles, that is, it is only useful for OCaml values that contain other OCaml values (including possibly themselves). Bigarrays cannot introduce cycles, and they are always printed as "<abstr>" anyway. The present commit proposes to be more conservative in which values are hashed by the cycle detector to avoid this issue: we skip hashing any value with tag above No_scan_tag -- which may not contain any OCaml values. Suggested-by: Gabriel Scherer <gabriel.scherer@gmail.com> Signed-off-by: Edwin Török <edwin.torok@cloud.com> Co-authored-by: Edwin Török <edwin.torok@cloud.com>

kayceesrk added 2 commits June 30, 2022 13:31

Make minor collection count an atomic global

e9b50cb

Decouple major slice from minor gc

c90613b

kayceesrk force-pushed the decouple_major_and_minor2 branch from 0394148 to c90613b Compare July 1, 2022 08:56

kayceesrk added 2 commits July 1, 2022 14:28

Fix flaky tests

9b8a0d1

Request major slice when minor gc is forced

eb7a38d

ctk21 reviewed Aug 17, 2022

View reviewed changes

kayceesrk mentioned this pull request Oct 1, 2022

Idle domain slows down major GC cycles ocaml/ocaml#11589

Closed

gadmm reviewed Oct 3, 2022

View reviewed changes

kayceesrk force-pushed the decouple_major_and_minor2 branch from ec6a975 to eb7a38d Compare October 5, 2022 07:45

Do not modify participating set for best-effort interruption

d014f02

gadmm reviewed Oct 5, 2022

View reviewed changes

kayceesrk added 3 commits October 8, 2022 09:04

Merge branch 'trunk' into decouple_major_and_minor2

62639b4

Check major slice epoch before resetting young limit

0831bc3

Use correct variable for minor collection count in test

64c443d

kayceesrk closed this Dec 6, 2022

Conversation

kayceesrk commented Jul 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ctk21 left a comment

Choose a reason for hiding this comment

Uh oh!

ctk21 Aug 17, 2022

Choose a reason for hiding this comment

Uh oh!

kayceesrk Oct 5, 2022

Choose a reason for hiding this comment

Uh oh!

gadmm Oct 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kayceesrk Oct 5, 2022

Choose a reason for hiding this comment

Uh oh!

gadmm Oct 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kayceesrk Oct 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gadmm Oct 9, 2022

Choose a reason for hiding this comment

Uh oh!

kayceesrk commented Oct 5, 2022

Uh oh!

gadmm Oct 5, 2022

Choose a reason for hiding this comment

Uh oh!

kayceesrk Oct 8, 2022

Choose a reason for hiding this comment

Uh oh!

gadmm Oct 9, 2022

Choose a reason for hiding this comment

Uh oh!

kayceesrk commented Dec 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kayceesrk commented Jul 1, 2022 •

edited

Loading

gadmm Oct 3, 2022 •

edited

Loading

gadmm Oct 5, 2022 •

edited

Loading

kayceesrk Oct 8, 2022 •

edited

Loading

kayceesrk commented Dec 6, 2022 •

edited

Loading