Conversation
0394148 to
c90613b
Compare
ctk21
left a comment
There was a problem hiding this comment.
So I think this works, but it is quite a complex area given all the possible interactions: domain lifecycle, backup thread servicing, etc. I wonder how best to test it - that things work in the testsuite is good. I wonder if running it through sandmark to see that the RSS and total number of major cycles is not meaningfully changed?
I can see that triggering a best effort slice without using the full STW mechanism is desirable to avoid synchronisation in advbance_global_major_slice_epoch; on balance I don't think the added use all_domains_lock in this PR is too much added complexity.
runtime/domain.c
Outdated
| global one [caml_major_slice_epoch]. */ | ||
| for(int i = 0; i < stw_domains.participating_domains; i++) { | ||
| dom_internal * di = stw_domains.domains[i]; | ||
| stw_request.participating[i] = di->state; |
There was a problem hiding this comment.
I don't think you should set this - it isn't needed and also it might clobber stw_request which isn't entirely protected by all_domains_lock in caml_try_run_on_all_domains_with_spin_work.
Remove this line?
There was a problem hiding this comment.
Good catch. Looks like a copy-paste error. I'll remove this line.
I'll also remove the assertion on the next line:
CAMLassert(!d->interruptor.interrupt_pending);It is unclear to me why this is expected to hold.
runtime/domain.c
Outdated
| /* This domain is the first one to use up half of its minor heap arena | ||
| in this minor cycle. Trigger major slice on other domains. */ | ||
| if (caml_plat_try_lock(&all_domains_lock)) { | ||
| /* Note that this interrupt is best-effort. If we get the lock, | ||
| then interrupt all the domains. If not, either some other domain | ||
| is calling for a stop-the-world section interrupting all the | ||
| domains, or a domain is being created or terminated. All of these | ||
| actions also try to lock [all_domains_lock] mutex, and the above | ||
| lock acquisition may fail. | ||
|
|
||
| If we don't get the lock, we don't interrupt other domains. This | ||
| is acceptable since it does not affect safety but only liveness -- | ||
| the speed of the major gc. The other domains may themselves fill | ||
| half of their minor heap triggering a major slice, or do it right | ||
| after their next minor GC when they observe that their | ||
| domain-local [Caml_state->major_slice_epoch] is less than the | ||
| global one [caml_major_slice_epoch]. */ |
There was a problem hiding this comment.
IIUC there is some involved logic to make sure that other domains work enough for the major GC before being forced into a minor collection.
Here is another possible approach (this is for the record, as I had the chance to propose this to KC verbally already).
- Every time a minor collection is needed while the minor heap is half-empty, do a stop© collection into the free half of the minor heap. This reduces the amount of false promotions and (what is important here) avoids increasing the work debt for the major GC.
- Every time a minor collection is needed before the major slice is done, simply transfer the work debt to the next minor cycle.
Note: the major slice happens when the minor-heap is half-full, so the two trigger conditions coincide. Therefore, the major slice can be delayed, but the work debt never accumulates.
In this approach, any additional work needed to keep up with phase changes (see ocaml#11589) should be accounted for separately from the slice-at-middle rendez-vous.
There was a problem hiding this comment.
Hi @gadmm, I'm trying to understand how the proposal to perform stop & copy collection in the minor heap is solving the current problem. I'd think that is an orthogonal improvement. Perhaps your intention is to document future GC improvements. Not sure what the best place for this would be. Perhaps ocaml/RFCs?
There are some (not insurmountable) challenges with implementing this copying collection for minor heap given that we may have pointers between the minor heap arenas of different domains. A parallel copying collector will have to synchronize to copy the objects to the to_space. While I've implemented simple Cheney copying collectors before, I don't know the state of the art in parallel copying collections.
Coming back to the lines of code under discussion, in practice, I would think that the best-effort interruption above would work well. If the lock is not obtained, the worst that can happen is that the major slice is pushed to after the next minor collection on the other domains.
There was a problem hiding this comment.
Your current strategy is to trigger slices on all domains as soon as one domain reaches half of the minor heap. This is a bit heuristic, for instance due to scheduling hazards mutators on other domains could very well barely be restarting after the minor collection, in which case the splitting will not be effective in reducing latency. It might very-well work in practice, but I was pointing out that there is no need for a heuristic approach, since with the semispace copying approach, the logic for when to do a slice is independent of the work of other domains.
I would not make this suggestion here if I thought it was as complex as you seem to think. I will be happy to discuss details elsewhere but ocaml/RFCs is probably not a good place for this (this concerns the implementation rather than a language change, and also some insiders discouraged in private its use so I do not know what to think about it).
I am not worried about the try_lock; have you considered accessing the interruptor via all_domains instead of stw_participants to avoid having to lock at all? (see this approach here: ocaml@dcb6520#diff-67115925103982a8ebeb085cfab5ef31a182c9a442bc51e053934364d3750dafR1507-R1517)
There was a problem hiding this comment.
I would not make this suggestion here if I thought it was as complex as you seem to think.
As with anything in the GC implementation, my experience is that, for anything that is not trivial, there is typically a significant amount of implementation, debugging and tuning effort required. My approach has been to hack up a prototype and only test it on the best-case input for that design. If that is promising, then start implementing that feature for real.
I will be happy to discuss details elsewhere but ocaml/RFCs is probably not a good place for this (this concerns the implementation rather than a language change, and also some insiders discouraged in private its use so I do not know what to think about it).
I shall bring this up again with the caml-devel.
have you considered accessing the interruptor via all_domains instead of stw_participants to avoid having to lock at all? (see this approach here: ocaml@dcb6520#diff-67115925103982a8ebeb085cfab5ef31a182c9a442bc51e053934364d3750dafR1507-R1517)
Is this safe? I am worried about races wrt other operations such as spawning and terminating domains. It felt uneasy to muck with their interruption flag and young limit. If this is safe, it would be nice to write down the guarantee for such a global interruption as code comments along with a short account of why. What do you think?
Given that triggering slices on other domains happens once every minor GC cycle, the current approach of taking the lock for interruption should not be expensive.
There was a problem hiding this comment.
The PR I pointed to (ocaml#11307) has to make a minor change to synchronize the read of interruptor_word with domain initialization. Modulo this change, this is safe because interruptor_word is set once and for all. I think ocaml#11307 documents enough the needed synchronisation but let me know if it is unclear (let's discuss it over there). Once ocaml#11307 is accepted, the function caml_interrupt_all_for_signal can be freely re-used for this purpose.
I don't have objections to the current heuristic for triggering slices nor to taking the lock, this is taking a good shot at a first implementation of the feature.
This is indeed something to be done. We now have instructions retired in Sandmark, which was quite useful to debug a few regressions. |
ec6a975 to
eb7a38d
Compare
| } | ||
|
|
||
| if (d->major_slice_epoch < atomic_load (&caml_major_slice_epoch)) { | ||
| d->requested_major_slice = 1; |
There was a problem hiding this comment.
The condition d->major_slice_epoch < atomic_load (&caml_major_slice_epoch) playing the role of the payload of an inter-domain interruption, it should be added among the conditions checked at the end of caml_reset_young_limit. Otherwise, a race might lead to the interrupt being dropped.
There was a problem hiding this comment.
Yes, this looks right. As a cosmetic comment, I think it would be clearer to read by giving it name.
|
Closing this now that ocaml#11750 is open on |
…l#13294) The toplevel printer detects cycles by keeping a hashtable of values that it has already traversed. However, some OCaml runtime types (at least bigarrays) may be partially uninitialized, and hashing them at arbitrary program points may read uninitialized memory. In particular, the OCaml testsuite fails when running with a memory-sanitizer enabled, as bigarray printing results in reads to uninitialized memory: ``` ==133712==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x4e6d11 in caml_ba_hash /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45 #1 0x52474a in caml_hash /var/home/edwin/git/ocaml/runtime/hash.c:251:35 #2 0x599ebf in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1065:14 #3 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9 #4 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3 #5 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) #6 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) #7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37) Uninitialized value was created by a heap allocation #0 0x47d306 in malloc (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x47d306) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37) #1 0x4e7960 in caml_ba_alloc /var/home/edwin/git/ocaml/runtime/bigarray.c:246:12 #2 0x4e801f in caml_ba_create /var/home/edwin/git/ocaml/runtime/bigarray.c:673:10 #3 0x59b8fc in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14 #4 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9 #5 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3 #6 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) #7 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) #8 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37) SUMMARY: MemorySanitizer: use-of-uninitialized-value /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45 in caml_ba_hash ``` The only use of hashing in genprintval is to avoid cycles, that is, it is only useful for OCaml values that contain other OCaml values (including possibly themselves). Bigarrays cannot introduce cycles, and they are always printed as "<abstr>" anyway. The present commit proposes to be more conservative in which values are hashed by the cycle detector to avoid this issue: we skip hashing any value with tag above No_scan_tag -- which may not contain any OCaml values. Suggested-by: Gabriel Scherer <gabriel.scherer@gmail.com> Signed-off-by: Edwin Török <edwin.torok@cloud.com> Co-authored-by: Edwin Török <edwin.torok@cloud.com>
Rework of #7. This now accounts for triggering major slice on several domains.
Previously:
Currently: