Implement quality treatment for asynchronous actions in multicore (2/N) by gadmm · Pull Request #11190 · ocaml/ocaml

gadmm · 2022-04-13T18:38:16Z

This PR is another part of #11057 and comes after #11095.

This restores the polling behaviour in C code of OCaml 4 by delaying finalisers until the next suitable safe point.

This also removes races whereby signals can be forgotten.

Reviewing is a bit more involved than going commit-per-commit, see #11095 (comment). In particular, some FIXME comments are added about existing bugs, and not fixed in this PR but later ones.

cc @sadiqj, @gasche. (I think @stedolan, @jhjourdan can also be interested.)

gasche · 2022-04-14T15:04:35Z

I'm a bit sad to see the new user interface for Alloc_small (the "block" parameter) go away to be replaced by the weird Setup_for_gc and Restore_after_gc from before. I find the "new" interface easier to read, morally we are passing a higher-order parameter by position (so almost like a real function), instead of passing two partial pieces of code by name. This has the cost of some added redundancy, but maybe those could be hidden by some simpler functions re-exporting the caml_alloc_small_dispatch with common parameters (nallocs = 1, alloc_lens = NULL). The more frustrating aspect is the need to pass wosize to both Alloc_small itself and the gc handler, I don't know how to avoid a painful-in-practice redundancy there.

gadmm · 2022-04-14T15:13:15Z

So am I. What are you suggesting?

gasche · 2022-04-14T15:19:50Z

Maybe the following could work:

Alloc_small declares an internal variable wosize (this would also protect against multiple evaluation of the macro parameter)
then we can do Alloc_small(result, 1, tag, { caml_gc_dispatch_simple(wosize, CAML_FROM_C) } and it works

gasche · 2022-04-14T15:22:50Z

runtime/domain.c

  atomic_exchange(&dom_st->young_limit, (uintnat)dom_st->young_start);
-  if (caml_incoming_interrupts_queued()
+  dom_internal * d = &all_domains[dom_st->id];
+  if (atomic_load_relaxed(&d->interruptor.interrupt_pending)


I think caml_incoming_interrupts_queued was an acquire load, which is replaced by a relaxed load here. Don't we need an acquire load to properly synchronize with the STW leader?

We do not need an acquire load. There is no subsequent read that it needs to synchronize with. The synchronisation we need is with the preceding read done by atomic_exchange.

gasche · 2022-04-14T15:26:19Z

runtime/domain.c

     achieves the proper synchronisation. */
  atomic_exchange(&dom_st->young_limit, (uintnat)dom_st->young_start);
-  if (caml_incoming_interrupts_queued()
+  dom_internal * d = &all_domains[dom_st->id];


This changes seems to be paving the way to call caml_reset_young_limit at another domain, but I can't tell (neither in #11057) where that would happen. When would we need it? (Are you planning to handle cross-domain interrupts with this mechanism, by calling caml_resete_young_limit on the thread to interrupt?)

"private communication" from @sadiqj.

I think this is a desired direction for the domains implementation, and the current style of the surrounding code. (I mean relying less on Caml_state but passing domain_state around.)

Are you sure this was me? I'm not sure I recognise that code.

It was actually a quote from @ctk21 which you forwarded to me. It was a generic remark about passing the domain state around rather than introducing calls to Caml_state (something about work stealing for STW sections, and also the cost of Caml_state on some platforms).

gadmm · 2022-04-14T15:39:21Z

@sadiqj I have an occasional failure of the test weak-ephe-final/finaliser2.ml as described here: #11057 (comment). Is there anything already known about this issue?

gadmm · 2022-04-14T16:19:15Z

I have considered keeping the "higher-order" style, but my conclusion was that it is not adapted to the situation. The usage was more adapted in multicore but this usage has now disappeared. There is no relationship with the old usage, for which I think it would make everything needlessly complicated (and not really higher-order). In particular I am worried about the multiplicity of parameters to Alloc_small in OCaml 4 to pass around. If you look at multicore usages, the duplication and non-locality of delicate code was less of a problem.

Now the old style was perfectly fine for OCaml 4 and what motivated the new style is no longer there, so I am happy with this PR.

gadmm · 2022-04-14T23:15:21Z

I just learnt that higher-order macros are possible, so this might after all give a nice simplification: https://godbolt.org/z/Wa16erbr4. However I feel like this improvement is orthogonal to the current PR, since this is an improvement over both the OCaml 4 style and the multicore style, and I prefer to preserve whatever steam I have left for the rest of the review process.

sadiqj · 2022-04-15T12:08:33Z

@sadiqj I have an occasional failure of the test weak-ephe-final/finaliser2.ml as described here: #11057 (comment). Is there anything already known about this issue?

@fabbing hit this on the frame pointers PR too. I think @jonludlam was going to take a look.

sadiqj

Apologies for the delay in taking a look at this @gadmm and thanks for your effort in actually getting it done. For anyone else looking at the diff, large parts of the early commits are essentially code from 4.x (including the Setup_for_gc and Restore_after_gc) and while they could be improved I don't think we're in any worse position code-wise than we were before only now we benefit from asynchronous exception safety. This also gets us a lot closer to statmemprof.

Only a couple of minor points that it would be good to get clarification on which were left as comments.

There's an increased cost in leave blocking section but checking for signals is pretty cheap with Xavier's single word check. There's also an extra TLS though, which might be something to watch on some platforms.

@gasche were you planning to do a review?

runtime/domain.c

runtime/signals.c

gadmm · 2022-05-19T13:44:16Z

I rebased & added Changes. Eventually decided for the higher-order macro.

gasche · 2022-05-19T14:13:19Z

I haven't looked at the rest, but I like the higher-order macro. It's much less scary than it sounds (the code does not look complex), and less verbose than reverting to the previous approach for most use-cases.

sadiqj · 2022-05-25T13:56:47Z

I've taken a look and am happy with the macro changes. Could you resolve the conflict in domain.c (the eventring PR moves the EV_MINOR event out of caml_poll_gc_work so you can just remove it) and I can get this merged @gadmm ?

gadmm · 2022-05-25T14:44:23Z

done.

sadiqj · 2022-05-25T15:02:25Z

Well that's a curious testsuite failure. Why does this change the number of major collections and only on OS X?

gadmm · 2022-05-25T15:13:18Z

This PR includes a fix to gc_full_major_exn, but apart from that it does not ring any bell on my side, especially given the platform-specific behaviour.

gadmm · 2022-05-25T17:14:57Z

Could you please have a look? I am a bit busy right now and I am not familiar with this eventlog test, it might have to do with the test itself. If I follow correctly the problem is that there is an extra slice here and there at each major GC in a reproducible manner (12000 vs 8000 for 2000 full major GC running 3 major collections each = 0 to 1 extra slice per major collection). But I do not see why you check the number of slices in the first place, since it might not be a portable number or a number stable across changes to GC internals.

sadiqj · 2022-05-25T17:28:52Z

I'll take a look. The test needs to be sufficiently low level because runtime events is fairly tied to the GC internals. I may have got that particular test wrong though.

gadmm · 2022-06-07T14:55:49Z

Is there any update on this? It would be nice to get to the part 3/N soon (where in fact N=4 but part 4/4 is not needed for 5.0). I strongly suspect that the failing test is not the fault of this PR, could it be merged and you look for the problem in parallel?

gasche · 2022-06-07T19:28:54Z

Could you by chance rebase on the current trunk? This would restart the CI, which may help establishing that the OSX failure was flimsy.

sadiqj · 2022-06-07T20:58:08Z

Sorry @gadmm and thanks for the poke, I got blocked on reproducing on OS X.

I want to understand why this test is only failing with this PR, I haven't seen it fail on any other setup yet.

kayceesrk · 2022-06-08T04:30:45Z

I can recreate this test tests/lib-runtime-events/'test_caml.ml' failure on both Linux and macOS. Both fail similarly:

$ make one TEST=tests/lib-runtime-events/'test_caml.ml'
....
> -minors: 18000, majors: 8000
> +minors: 18000, majors: 12000

@sadiqj can you check whether the test fails on Linux.

gadmm · 2022-06-08T12:39:48Z

We do not seem to see this failure in the same way. I don't think it is "flimsy" in the sense that running it again will make the failure disappear. This PR could change minor details of GC (changes to gc_ctrl.c, changes to the scheduling of finalisers) that causes the numbers to change with this PR and not others, but this is irrelevant.

It seems to me that the result of the test could depend on irrelevant machine- and environment-specific differences. Whether there is one more or one fewer slice could depend on the size of the heap. And on my machine simply opening a toplevel in various directories and running Gc.stats () shows that the size of the heap can vary ~4% depending on the current working directory.

So here you get 3 slices per major instead of 2, but until proof of the contrary this does not point at a problem with my PR. Please disable the test and then I will rebase so that we can get CI green.

To write a reliable test you could try to compare numbers from ones obtained using GC timing hooks instead of using arbitrary numbers in the reference file.

sadiqj · 2022-06-08T13:53:03Z

That test has been fixed on trunk now, if you rebase now @gadmm we should be good to go.

This commit does not change behaviour. It rephrases `Alloc_small` in a way that is more flexible. We will need the additional flexibility later on. Prelude to removing the loop and generalizing `caml_reset_young_limit` to other pending actions, which is necessary for delaying them.

We do this by reintroducing `caml_alloc_small_dispatch` from pre-multicore trunk. We can now update `young_limit` to its desired value without causing infinite loops when delaying callbacks. Prerequisite to delaying callbacks and reintroducing memprof.

Useful later on

This lets us reset [young_limit] _before_ running the asynchronous callbacks, and avoids polling repeatedly for signals that are not for us.

Fixes test `tests/weak-ephe-final/finaliser2.ml`.

gadmm · 2022-06-08T15:02:45Z

Thanks, done. Let's keep fingers crossed.

sadiqj · 2022-06-08T15:25:11Z

Looks good. Thanks for the work and the prod to get it merged @gadmm

gasche reviewed Apr 14, 2022

View reviewed changes

kayceesrk mentioned this pull request Apr 16, 2022

Make finaliser2 test deterministic #11201

Merged

gadmm force-pushed the multicore_async_actions_2 branch from 47c4e00 to 5dd2744 Compare April 18, 2022 21:53

gadmm mentioned this pull request Apr 27, 2022

Implement quality treatment for asynchronous actions in multicore #11057

Closed

sadiqj approved these changes May 15, 2022

View reviewed changes

runtime/domain.c Show resolved Hide resolved

runtime/signals.c Outdated Show resolved Hide resolved

gadmm force-pushed the multicore_async_actions_2 branch from 5dd2744 to e5ae7dd Compare May 19, 2022 13:41

gadmm force-pushed the multicore_async_actions_2 branch from e5ae7dd to 1fb559a Compare May 25, 2022 14:42

kayceesrk self-assigned this May 31, 2022

sadiqj mentioned this pull request Jun 8, 2022

Make runtime_events test more robust #11305

Merged

gadmm added 10 commits June 8, 2022 16:53

[minor] Rename conflicting macro BITS_PER_WORD

58eaef3

Remove loop in Alloc_small

9da68be

We do this by reintroducing `caml_alloc_small_dispatch` from pre-multicore trunk. We can now update `young_limit` to its desired value without causing infinite loops when delaying callbacks. Prerequisite to delaying callbacks and reintroducing memprof.

Take signals into account for caml_reset_young_limit

f68a5e9

Introduce atomic_{load,store}_relaxed shorthands

a158fbc

Useful later on

Reintroduce [caml_something_to_do] as [Caml_state->action_pending]

1b6f8f8

This lets us reset [young_limit] _before_ running the asynchronous callbacks, and avoids polling repeatedly for signals that are not for us.

Delay finalisers and reimplement [caml_process_pending_actions_exn]

a7f305b

Further fixes to gc_ctrl.c

8f33bfa

Fixes test `tests/weak-ephe-final/finaliser2.ml`.

Fix test callback/test_signalhandler.ml

93b3981

Changes

617567c

gadmm force-pushed the multicore_async_actions_2 branch from 1fb559a to 617567c Compare June 8, 2022 14:58

sadiqj merged commit cbcb507 into ocaml:trunk Jun 8, 2022

gadmm mentioned this pull request Jun 9, 2022

Implement quality treatment for asynchronous actions in multicore (3/3?) #11307

Merged

Conversation

gadmm commented Apr 13, 2022

Uh oh!

gasche commented Apr 14, 2022

Uh oh!

gadmm commented Apr 14, 2022

Uh oh!

gasche commented Apr 14, 2022

Uh oh!

gasche Apr 14, 2022

Choose a reason for hiding this comment

Uh oh!

gadmm Apr 14, 2022

Choose a reason for hiding this comment

Uh oh!

gasche Apr 14, 2022

Choose a reason for hiding this comment

Uh oh!

gadmm Apr 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sadiqj Apr 14, 2022

Choose a reason for hiding this comment

Uh oh!

gadmm Apr 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gadmm commented Apr 14, 2022

Uh oh!

gadmm commented Apr 14, 2022

Uh oh!

gadmm commented Apr 14, 2022

Uh oh!

sadiqj commented Apr 15, 2022

Uh oh!

sadiqj left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gadmm commented May 19, 2022

Uh oh!

gasche commented May 19, 2022

Uh oh!

sadiqj commented May 25, 2022

Uh oh!

gadmm commented May 25, 2022

Uh oh!

sadiqj commented May 25, 2022

Uh oh!

gadmm commented May 25, 2022

Uh oh!

gadmm commented May 25, 2022

Uh oh!

sadiqj commented May 25, 2022

Uh oh!

gadmm commented Jun 7, 2022

Uh oh!

gasche commented Jun 7, 2022

Uh oh!

sadiqj commented Jun 7, 2022

Uh oh!

kayceesrk commented Jun 8, 2022

Uh oh!

gadmm commented Jun 8, 2022

Uh oh!

sadiqj commented Jun 8, 2022

Uh oh!

gadmm commented Jun 8, 2022

Uh oh!

sadiqj commented Jun 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

gadmm Apr 14, 2022 •

edited

Loading

gadmm Apr 14, 2022 •

edited

Loading

sadiqj left a comment •

edited

Loading