Skip to content

Decouple major and minor2#8

Closed
kayceesrk wants to merge 8 commits intotrunkfrom
decouple_major_and_minor2
Closed

Decouple major and minor2#8
kayceesrk wants to merge 8 commits intotrunkfrom
decouple_major_and_minor2

Conversation

@kayceesrk
Copy link
Copy Markdown
Owner

@kayceesrk kayceesrk commented Jul 1, 2022

Rework of #7. This now accounts for triggering major slice on several domains.

Previously:

image

Currently:

image

@kayceesrk kayceesrk force-pushed the decouple_major_and_minor2 branch from 0394148 to c90613b Compare July 1, 2022 08:56
Copy link
Copy Markdown

@ctk21 ctk21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think this works, but it is quite a complex area given all the possible interactions: domain lifecycle, backup thread servicing, etc. I wonder how best to test it - that things work in the testsuite is good. I wonder if running it through sandmark to see that the RSS and total number of major cycles is not meaningfully changed?

I can see that triggering a best effort slice without using the full STW mechanism is desirable to avoid synchronisation in advbance_global_major_slice_epoch; on balance I don't think the added use all_domains_lock in this PR is too much added complexity.

runtime/domain.c Outdated
global one [caml_major_slice_epoch]. */
for(int i = 0; i < stw_domains.participating_domains; i++) {
dom_internal * di = stw_domains.domains[i];
stw_request.participating[i] = di->state;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should set this - it isn't needed and also it might clobber stw_request which isn't entirely protected by all_domains_lock in caml_try_run_on_all_domains_with_spin_work.

Remove this line?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Looks like a copy-paste error. I'll remove this line.

I'll also remove the assertion on the next line:

CAMLassert(!d->interruptor.interrupt_pending);

It is unclear to me why this is expected to hold.

runtime/domain.c Outdated
Comment on lines +1516 to +1532
/* This domain is the first one to use up half of its minor heap arena
in this minor cycle. Trigger major slice on other domains. */
if (caml_plat_try_lock(&all_domains_lock)) {
/* Note that this interrupt is best-effort. If we get the lock,
then interrupt all the domains. If not, either some other domain
is calling for a stop-the-world section interrupting all the
domains, or a domain is being created or terminated. All of these
actions also try to lock [all_domains_lock] mutex, and the above
lock acquisition may fail.

If we don't get the lock, we don't interrupt other domains. This
is acceptable since it does not affect safety but only liveness --
the speed of the major gc. The other domains may themselves fill
half of their minor heap triggering a major slice, or do it right
after their next minor GC when they observe that their
domain-local [Caml_state->major_slice_epoch] is less than the
global one [caml_major_slice_epoch]. */
Copy link
Copy Markdown

@gadmm gadmm Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC there is some involved logic to make sure that other domains work enough for the major GC before being forced into a minor collection.

Here is another possible approach (this is for the record, as I had the chance to propose this to KC verbally already).

  • Every time a minor collection is needed while the minor heap is half-empty, do a stop&copy collection into the free half of the minor heap. This reduces the amount of false promotions and (what is important here) avoids increasing the work debt for the major GC.
  • Every time a minor collection is needed before the major slice is done, simply transfer the work debt to the next minor cycle.

Note: the major slice happens when the minor-heap is half-full, so the two trigger conditions coincide. Therefore, the major slice can be delayed, but the work debt never accumulates.

In this approach, any additional work needed to keep up with phase changes (see ocaml#11589) should be accounted for separately from the slice-at-middle rendez-vous.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gadmm, I'm trying to understand how the proposal to perform stop & copy collection in the minor heap is solving the current problem. I'd think that is an orthogonal improvement. Perhaps your intention is to document future GC improvements. Not sure what the best place for this would be. Perhaps ocaml/RFCs?

There are some (not insurmountable) challenges with implementing this copying collection for minor heap given that we may have pointers between the minor heap arenas of different domains. A parallel copying collector will have to synchronize to copy the objects to the to_space. While I've implemented simple Cheney copying collectors before, I don't know the state of the art in parallel copying collections.

Coming back to the lines of code under discussion, in practice, I would think that the best-effort interruption above would work well. If the lock is not obtained, the worst that can happen is that the major slice is pushed to after the next minor collection on the other domains.

Copy link
Copy Markdown

@gadmm gadmm Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your current strategy is to trigger slices on all domains as soon as one domain reaches half of the minor heap. This is a bit heuristic, for instance due to scheduling hazards mutators on other domains could very well barely be restarting after the minor collection, in which case the splitting will not be effective in reducing latency. It might very-well work in practice, but I was pointing out that there is no need for a heuristic approach, since with the semispace copying approach, the logic for when to do a slice is independent of the work of other domains.

I would not make this suggestion here if I thought it was as complex as you seem to think. I will be happy to discuss details elsewhere but ocaml/RFCs is probably not a good place for this (this concerns the implementation rather than a language change, and also some insiders discouraged in private its use so I do not know what to think about it).

I am not worried about the try_lock; have you considered accessing the interruptor via all_domains instead of stw_participants to avoid having to lock at all? (see this approach here: ocaml@dcb6520#diff-67115925103982a8ebeb085cfab5ef31a182c9a442bc51e053934364d3750dafR1507-R1517)

Copy link
Copy Markdown
Owner Author

@kayceesrk kayceesrk Oct 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not make this suggestion here if I thought it was as complex as you seem to think.

As with anything in the GC implementation, my experience is that, for anything that is not trivial, there is typically a significant amount of implementation, debugging and tuning effort required. My approach has been to hack up a prototype and only test it on the best-case input for that design. If that is promising, then start implementing that feature for real.

I will be happy to discuss details elsewhere but ocaml/RFCs is probably not a good place for this (this concerns the implementation rather than a language change, and also some insiders discouraged in private its use so I do not know what to think about it).

I shall bring this up again with the caml-devel.

have you considered accessing the interruptor via all_domains instead of stw_participants to avoid having to lock at all? (see this approach here: ocaml@dcb6520#diff-67115925103982a8ebeb085cfab5ef31a182c9a442bc51e053934364d3750dafR1507-R1517)

Is this safe? I am worried about races wrt other operations such as spawning and terminating domains. It felt uneasy to muck with their interruption flag and young limit. If this is safe, it would be nice to write down the guarantee for such a global interruption as code comments along with a short account of why. What do you think?

Given that triggering slices on other domains happens once every minor GC cycle, the current approach of taking the lock for interruption should not be expensive.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR I pointed to (ocaml#11307) has to make a minor change to synchronize the read of interruptor_word with domain initialization. Modulo this change, this is safe because interruptor_word is set once and for all. I think ocaml#11307 documents enough the needed synchronisation but let me know if it is unclear (let's discuss it over there). Once ocaml#11307 is accepted, the function caml_interrupt_all_for_signal can be freely re-used for this purpose.

I don't have objections to the current heuristic for triggering slices nor to taking the lock, this is taking a good shot at a first implementation of the feature.

@kayceesrk
Copy link
Copy Markdown
Owner Author

I wonder if running it through sandmark to see that the RSS and total number of major cycles is not meaningfully changed?

This is indeed something to be done. We now have instructions retired in Sandmark, which was quite useful to debug a few regressions.

@kayceesrk kayceesrk force-pushed the decouple_major_and_minor2 branch from ec6a975 to eb7a38d Compare October 5, 2022 07:45
}

if (d->major_slice_epoch < atomic_load (&caml_major_slice_epoch)) {
d->requested_major_slice = 1;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition d->major_slice_epoch < atomic_load (&caml_major_slice_epoch) playing the role of the payload of an inter-domain interruption, it should be added among the conditions checked at the end of caml_reset_young_limit. Otherwise, a race might lead to the interrupt being dropped.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this 0831bc3?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks right. As a cosmetic comment, I think it would be clearer to read by giving it name.

@kayceesrk
Copy link
Copy Markdown
Owner Author

kayceesrk commented Dec 6, 2022

Closing this now that ocaml#11750 is open on ocaml/ocaml.

@kayceesrk kayceesrk closed this Dec 6, 2022
kayceesrk pushed a commit that referenced this pull request Aug 21, 2024
…l#13294)

The toplevel printer detects cycles by keeping a hashtable of values
that it has already traversed.

However, some OCaml runtime types (at least bigarrays) may be
partially uninitialized, and hashing them at arbitrary program points
may read uninitialized memory. In particular, the OCaml testsuite
fails when running with a memory-sanitizer enabled, as bigarray
printing results in reads to uninitialized memory:

```
==133712==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x4e6d11 in caml_ba_hash /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45
    #1 0x52474a in caml_hash /var/home/edwin/git/ocaml/runtime/hash.c:251:35
    #2 0x599ebf in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1065:14
    #3 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9
    #4 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3
    #5 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
    #6 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
    #7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)

  Uninitialized value was created by a heap allocation
    #0 0x47d306 in malloc (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x47d306) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
    #1 0x4e7960 in caml_ba_alloc /var/home/edwin/git/ocaml/runtime/bigarray.c:246:12
    #2 0x4e801f in caml_ba_create /var/home/edwin/git/ocaml/runtime/bigarray.c:673:10
    #3 0x59b8fc in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14
    #4 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9
    #5 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3
    #6 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
    #7 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
    #8 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)

SUMMARY: MemorySanitizer: use-of-uninitialized-value /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45 in caml_ba_hash
```

The only use of hashing in genprintval is to avoid cycles, that is, it
is only useful for OCaml values that contain other OCaml values
(including possibly themselves). Bigarrays cannot introduce cycles,
and they are always printed as "<abstr>" anyway.

The present commit proposes to be more conservative in which values
are hashed by the cycle detector to avoid this issue: we skip hashing
any value with tag above No_scan_tag -- which may not contain any
OCaml values.

Suggested-by: Gabriel Scherer <gabriel.scherer@gmail.com>

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Co-authored-by: Edwin Török <edwin.torok@cloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants