Simplifications/cleanups/clarifications from multicore review by gadmm · Pull Request #10966 · ocaml/ocaml

gadmm · 2022-01-28T17:27:12Z

This PR is best read commit-by-commit. I grouped them together because they are all minor changes coming from my review of the multicore PR and are expected to introduce no behaviour change (and I think will interest @kayceesrk and @sadiqj).

As discussed, document caml_modify. Confusion stems from the fact that the PLDI paper describes a publication scheme based on the old concurrent collector design, so we document the difference with the paper. (As a side note, have the benchmarks from the paper been re-run with the release store instead of a relaxed store? If there is an impact then one could try optimizations based on the write barrier.)
Remove some dead code.
Make some currently-public APIs private.
Simplify/restore the APIs for signals/actions. Regarding the public API this is necessary work for 5.0, it prepares trunk for a 5.0 release in case Meta-issue on asynchronous action handling in multicore #10915 does not move fast enough. The rest is preliminary work to solving the issue.

I believe that this PR will not need too many discussions, but if you want me to split any part of it to a different PR do not hesitate to request it.

Each commit has a detailed commit log that gives more details.

xavierleroy

Thank you for all the attention you gave to the runtime system code. I read commit by commit. The first 5 look obviously correct, and the last 2 are a bit above my grade but I trust you that they do not change behavior yet. So, I'm approving this PR but would prefer to have a second review.

runtime/domain.c

sadiqj · 2022-01-30T16:43:35Z

I've started reading through and I can finish the review of by mid-week.

One thing I am not sure about is whether it's the right thing to reintroduce caml_process_pending_actions_exn. While the reimplementation complies with the _exn bit, it doesn't actually process pending actions - only signals. If there is code that has long-running C code that relies on that as a poll, it will compile but may perform terribly if there are multiple domains running.

gadmm · 2022-01-31T14:17:57Z

I would tend to agree with you regarding caml_process_pending_action_exn and reintroduce it in the API only commented-out. But polling for signals is barely just what is needed to avoid a big temporary rework of the Coq VM. (The other regression that affects the Coq VM is allocation calling finalisers right away, but they seem to avoid it by not using finalisers at all.) Right @silene?

However I will document the limitation.

gadmm · 2022-01-31T14:28:46Z

The 6th and 7th commit are best reviewed by comparing to pre-multicore trunk. It mainly reintroduces what was already there. Only for runtime/weak.c in the 7th I had to get creative, so you might want to proofread it twice.

sadiqj · 2022-01-31T14:33:05Z

I would tend to agree with you regarding caml_process_pending_action_exn and reintroduce it in the API only commented-out. But polling for signals is barely just what is needed to avoid a big temporary rework of the Coq VM.

So my understanding from rocq-prover/rocq#15464 (comment) and the discussion afterward is that switching to caml_process_pending_signals_exn (which is effectively what actions would now do) is sufficient.

silene · 2022-01-31T14:47:03Z

Right, there are no meaningful finalizers in Coq.

Basically, what Coq needs is an extremely cheap way of testing that a signal (namely sigint) is pending. By extremely cheap, I mean that, ideally, it should be nothing more than reading a single word in memory. If the test is positive, then we ask the runtime to trigger the resolution of the signal. The only thing we need is that, whatever the runtime performs at this point, if an exception is raised (and for sigint, it is), we have a way to intercept the exception to cleanup after ourselves before reraising the exception.

Whether we detect only pending signals or other pending events, we do not care, as long as the test is cheap and has no false positive. And if the runtime decides to perform a lot more work than just resolving the pending signal, that is also fine with us.

silene · 2022-01-31T15:04:09Z

And just to be complete, handling signals is just one of the issues. The other is what to do with Alloc_small. Indeed, we pass as the last macro argument some C code that calls caml_process_pending_actions. Again, we do not care what the runtime performs at this point and how long it takes. We just need some way to catch any exception that might be raised.

gadmm · 2022-01-31T16:50:12Z

Right. One point is that caml_process_pending_signals_exn is internal (but Coq VM already uses internals) and we can avoid having one more preprocessor switch in the Coq VM. It's as you prefer.

Coq VM will indeed need to be changed to call Alloc_small with { caml_handle_gc_interrupt(); } as last parameter. It might be forward-compatible for OCaml 5. (caml_handle_gc_interrupt_no_async_exc is renamed into caml_handle_gc_interupt in this PR, although it currently calls finalisers in both versions; later it should not).

So there's maybe the option of not requiring more changes to the CoqVM later.

It's worth testing the performance impact of polling in the new version. caml_check_pending_actions is broken currently, do not use it. caml_process_pending_*_exn are already guarded by caml_check_for_pending_signals which is much more efficient than it used to be. caml_process_pending_actions_* is always meant to be fast when no actions are pending.

silene · 2022-01-31T17:44:43Z

caml_check_for_pending_signals is much more efficient than it used to be

Perhaps, but it is still incredibly slower than the approach available in OCaml 4. In the old world, we could just check the value of caml_something_to_do. Now we have to execute the following function:

  atomic_thread_fence(memory_order_acquire);
  for (i = 0; i < NSIG_WORDS; i++) {
    if (atomic_load_explicit(&caml_pending_signals[i], memory_order_relaxed))
      return 1;
  }

By the way, was the performance of OCaml bytecode interpreter tested? The old one was checking caml_something_to_do several million times per second. Now that caml_check_for_pending_signals is called instead, I expect performance to be disastrous. I just tried on a simple loop computing Fibonacci. With OCaml trunk, the bytecode interpreter is 600% slower than OCaml 4.13. In fact, 75% of the time is actually wasted inside caml_check_for_pending_signals, according to perf.

gadmm · 2022-01-31T17:48:08Z

Believe it or not, this function amounts to a test of a single machine word and is compiled as such! See #10880. How recent is your benchmark?

gadmm · 2022-01-31T17:55:02Z

I am interested in feeback regarding the perf of caml_pending_actions*, later. (OCaml ≥ 4.10 has the same issue.) If needed we can declare it as inline in some way.

xavierleroy · 2022-01-31T18:01:17Z

I just tried on a simple loop computing Fibonacci. With OCaml trunk, the bytecode interpreter is 600% slower than OCaml 4.13. In fact, 75% of the time is actually wasted inside caml_check_for_pending_signals, according to perf.

That was the case a while ago, before #10880 was merged. Performance should be much better now, but can probably be improved some more by just testing young_ptr against young_limit.

gadmm · 2022-01-31T18:14:19Z

just testing young_ptr against young_limit

This would not work currently.

xavierleroy · 2022-01-31T18:15:37Z

This would not work currently.

Couldn't we make it work? It's good enough for native code, so it should be doable in bytecode too.

gadmm · 2022-01-31T18:20:31Z

I might have misread your comment. For @silene it is probably not enough to test young_limit because it is reset by allocations. Eventually for OCaml's bytecode interpreter the goal is to just have this test, yes. But if the issue mentioned at #10880 (comment) has the same origin then this is fixed on the way to solving #10915 (edit: looking at it again I do not see an indication that the failing test depends on #10915; I do not mean to discourage looking at the issue independently).

silene · 2022-01-31T18:35:37Z

How recent is your benchmark?

Indeed, my mistake. After recompiling OCaml trunk, it is not as slow, but it is still 70% slower than OCaml 4.08 and 4.13. That said, it does not seem to be due to caml_check_for_pending_signals, which is now down to 4% of the total time.

For the record, in the loop version of Fibonacci, time is spent inside caml_modify, while in the recursive version, time is spent inside caml_ensure_stack_capacity. But, while the time spent there is significant, it does not entirely explain the slowdown.

sadiqj

This looks good @gadmm, thanks for going through these. Other than the review comment on the diff, I'm still uneasy about having a caml_process_pending_actions_exn that doesn't actually process pending actions, only signals. That said the only real user we can find of the API is probably aware of the problem and we intend to fix it eventually.

sadiqj · 2022-02-01T19:03:36Z

otherlibs/systhreads/st_stubs.c

  if (atomic_load_acq(&Thread_main_lock.waiters) == 0) return Val_unit;

-  caml_process_pending_signals();
+  caml_process_pending_actions();


I'm not convinced this matches the behaviour in 4.x. If we look at https://github.com/ocaml/ocaml/blob/4.14/otherlibs/systhreads/st_stubs.c#L734-L739 we explicitly only deal with signals in caml_thread_yield.

I agree, I think my reasoning was that in theory with new polling locations, any signal handler could execute other actions and so the distinction is no longer relevant. But the reasoning was probably flawed and in any case not for this PR which is meant to be side-effect-free.

Some documentation comment erroneously disappeared in multicore, which encouraged to keep consistency with caml_enter/leave_blocking_section, which was useful information in this case. I restored the comment.

In light of this comment, the line Current_thread = st_tls_get(Thread_key); was added in caml_thread_enter_blocking_section but not in caml_thread_yield, so there is a now a discrepancy, please have a look.

Yes, this is a good point. @Engil do you happen to know how we ended up with this? trunk seems to update the current thread when entering a blocking section:

ocaml/otherlibs/systhreads/st_stubs.c

Line 210 in 1f9c958

Current_thread = st_tls_get(Thread_key);

but this doesn't seem to make sense? I think you must be the current thread in order to call caml_thread_enter_blocking_section.

(4.14 doesn't do this: https://github.com/ocaml/ocaml/blob/4.14/otherlibs/systhreads/st_stubs.c#L230 )

I think this is an oversight and is not required.
Weird interactions with backup threads or else do not seem likely at all (since a thread going through caml_thread_enter_blocking_section is the thread actually holding the lock and the thread that has been executing code.).
I think it is safe to remove this.

Ok, thanks. I'll make a PR doing so.

otherlibs/systhreads/st_stubs.c

gadmm · 2022-02-08T23:58:11Z

I have to leave your remaining remark for another time.

No change entry needed.

Remove caml_huge_fallback_count and FIXME comment. It never worked (has always been equal to zero) so it has no potential users, and a grep of opam sources returns no hit. There is little risk in removing it, and 5.0 is a good time for this. No change entry needed.

`caml/misc.h`: This header is very commonly included and I think it is not intended to make these functions and macros public. `caml/platform.h`: The whole header is made a CAML_INTERNALS. It contains non-prefixed identifiers and no function of public interest No change entry needed.

…c_interrupt Eventually we will only need the one that does not run finalisers, once it is implemented. No change entry needed.

gadmm · 2022-02-09T21:45:17Z

I have removed the commit about documenting caml_modify. Stephen is preparing us a much better explanation.
I have somehow fixed the implementation of caml_check_pending_actions, ~~which happens to fix a bug in io.c~~ (edit: hasty conclusion).
I have removed caml_process_pending_actions_exn for now. I have left it commented-out in the API for documentation purposes. @silene, please use the internal caml_process_pending_signals_exn in the meanwhile if OCaml 5.0 is released without this function.

Please review the last three commits (the others have not changed) and then I'll clean up the history.

sadiqj

This looks good now, thanks @gadmm

Let me know when the history is cleaned up and I'll merge.

- Restore old public API and comments. - Document current limitations, caml_process_pending_actions_exn is uninmplemented for now. - Move pending_actions back to signals.c - Reintroduce caml_process_pending_actions_with_root for subsequent commit - Remove superfluous functions - Try to restore discipline of _removing_ internal functions that directly raise when possible (caml_process_pending_signals). This is very preliminary work to reintroduce control over asynchronous exceptions. The first two points are desirable for 5.0. No behaviour change expected. No change entry needed.

Polling locations were tightened for memprof. They partially disappeared during multicore's rebase on top of 4.12 since multicore did not support memprof or the new polling machinery. In addition, it was inconsistent whether these changes and/or the comments relating them to memprof were kept or removed. These changes will be easy to miss when working to restore memprof unless someone goes through the diff of the multicore PR. Since I have already done it, this will make sure that nothing is accidentally lost. This commit is meant to be behaviour-preserving. No change entry needed.

Shorten the name caml_check_for_pending_signals for consistency

gadmm · 2022-02-10T11:52:19Z

It is clean now, thanks.

sadiqj · 2022-02-10T15:35:37Z

Brilliant, thanks for your work on this @gadmm

gadmm changed the title ~~Multicore review no changes~~ Simplifications/cleanups/clarifications from multicore review Jan 28, 2022

xavierleroy mentioned this pull request Jan 30, 2022

switch to strerror_r for reentrant error string conversion #10969

Closed

xavierleroy approved these changes Jan 30, 2022

View reviewed changes

runtime/domain.c Outdated Show resolved Hide resolved

sadiqj added the no-change-entry-needed label Jan 30, 2022

sadiqj reviewed Feb 1, 2022

View reviewed changes

gadmm force-pushed the multicore_review_no_changes branch from 3a78d18 to 10edc37 Compare February 8, 2022 23:52

gadmm added 4 commits February 9, 2022 22:23

Remove dead mutex code

9775245

No change entry needed.

Rename caml_handle_gc_interrupt_no_async_exception into caml_handle_g…

17d657c

…c_interrupt Eventually we will only need the one that does not run finalisers, once it is implemented. No change entry needed.

gadmm force-pushed the multicore_review_no_changes branch from 10edc37 to 44dd90e Compare February 9, 2022 21:46

sadiqj approved these changes Feb 10, 2022

View reviewed changes

gadmm added 2 commits February 10, 2022 12:44

Fix caml_check_pending_actions for io.c

0a372b2

Shorten the name caml_check_for_pending_signals for consistency

gadmm force-pushed the multicore_review_no_changes branch from 44dd90e to 0a372b2 Compare February 10, 2022 11:51

sadiqj merged commit 2beaad2 into ocaml:trunk Feb 10, 2022

sadiqj mentioned this pull request Feb 16, 2022

remove unncessary Current_thread restore in caml_thread_enter_blocking_section #11019

Merged

Conversation

gadmm commented Jan 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xavierleroy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sadiqj commented Jan 30, 2022

Uh oh!

gadmm commented Jan 31, 2022

Uh oh!

gadmm commented Jan 31, 2022

Uh oh!

sadiqj commented Jan 31, 2022

Uh oh!

silene commented Jan 31, 2022

Uh oh!

silene commented Jan 31, 2022

Uh oh!

gadmm commented Jan 31, 2022

Uh oh!

silene commented Jan 31, 2022

Uh oh!

gadmm commented Jan 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gadmm commented Jan 31, 2022

Uh oh!

xavierleroy commented Jan 31, 2022

Uh oh!

gadmm commented Jan 31, 2022

Uh oh!

xavierleroy commented Jan 31, 2022

Uh oh!

gadmm commented Jan 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

silene commented Jan 31, 2022

Uh oh!

sadiqj left a comment

Choose a reason for hiding this comment

Uh oh!

sadiqj Feb 1, 2022

Choose a reason for hiding this comment

Uh oh!

gadmm Feb 8, 2022

Choose a reason for hiding this comment

Uh oh!

sadiqj Feb 10, 2022

Choose a reason for hiding this comment

Uh oh!

abbysmal Feb 14, 2022

Choose a reason for hiding this comment

Uh oh!

sadiqj Feb 14, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gadmm commented Feb 8, 2022

Uh oh!

gadmm commented Feb 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sadiqj left a comment

Choose a reason for hiding this comment

Uh oh!

gadmm commented Feb 10, 2022

Uh oh!

sadiqj commented Feb 10, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gadmm commented Jan 28, 2022 •

edited

Loading

gadmm commented Jan 31, 2022 •

edited

Loading

gadmm commented Jan 31, 2022 •

edited

Loading

gadmm commented Feb 9, 2022 •

edited

Loading