Masking of asynchronous callbacks by gadmm · Pull Request #8961 · ocaml/ocaml

gadmm · 2019-09-22T17:58:40Z

Asynchronous callbacks (signal handlers, finalisers, memprof callbacks) can raise exceptions “out of thin air”; so-called interrupts (SML). This PR implements the masking of asynchronous callbacks which enforces critical sections in which asynchronous exception are guaranteed not to appear, as suggested in Dolan et al., Concurrent System Programming with Effect Handlers, §4.1.

The commits also have detailed logs; this PR is best read commit-by-commit.

1. Motivation

Currently, OCaml has two incompatible dialects: a dialect with asynchronous exceptions, very popular within a small community of developers and researchers of language tools (see #8873 (comment)), and a dialect with more-or-less reliable resource management (using unwind-protect-based abstractions when possible), popular within a small community of systems programmers. The dialects are incompatible because asynchronous exceptions make it currently impossible to reason about clean-up of resources, and both the requirements of asynchronous exceptions and resource management permeate whole programs. This work a pre-requisite to any attempt to reconcile the two dialects.

The problem is going to be worsened in OCaml multicore, which introduces a primitive notion of resource (fibers) that must be disposed-of explicitly and reliably: failing to do so (as of now) leaks the memory allocated for the fiber, and more importantly any resources "owned" by the fiber (i.e. freed via the exceptional termination of the thread). Asynchronous exceptions were correctly identified as a difficulty in Dolan et al.

The end goal (of which this is only a technical step) is to provide a resource-management and exception-safety model for OCaml which is robust in the face of asynchronous exception, at no added (developer, performance) cost compared to other kinds of unexpected exceptions and serious errors.

In this patch, we provide a way to switch for a short amount of time to a dialect that gives some C-style all-explicit control flow but where one accepts to be more careful about what one executes. The motivation for a dynamic switch is as follows: as a first approximation, one can imagine that this control could be provided by some static annotation on functions that instructs the compiler to remove opportunities for asynchronous callbacks to run. Allocations in particular would be replaced by new variants that do not run callbacks, similarly to what allocations in the C API currently do. In this first approximation, it would be possible to write safe resource-handling functions as desired, but at the cost of not being able to re-use existing code: one would have to write a non-polling variant of functions on lists, implement variants of runtime functions for arrays and system calls with a different polling behaviour, etc. It forces code duplication. To avoid this, we use a run-time switch understood by existing code, instead of static annotations. The role of a run-time switch is therefore to provide a polymorphism of polling behaviour for library code.

2. Overview

This PR includes a new per-domain state Caml_state->mask_async_callbacks, C functions to switch it (caml_mask, caml_unmask), a higher-order combinator (Sys.mask) in OCaml, and arranges the runtime so that it is easy to explore all polling locations for testing purposes. They are all meant for low-level use; the development of high-level abstractions (e.g. #8962) is outside of the scope. It introduces a distinction between possibly-raising and non-raising signal handlers, to exclude the Systhread tick from being delayed.

Optionally, it also includes a fix to a performance bug with C allocations when a signal is pending, and a combinator Sys.nonpreemptible_mask.

3. Design

A function Sys.mask is added, which guarantees that no asynchronous exceptions arise while it executes its arguments. It does so by delaying the execution of any possibly-raising asynchronous callbacks. Functions caml_{,un}mask are also provided for the FFI user. The tests demonstrates a toy implementation of an unwind-protect combinator safe of asynchronous exceptions (with_resource as proposed in #8962) inspired by GHC's bracket (it is essentially Fun.protect but adapted to protect against interrupts, for instance no exception should arise from resource release ever). Sys.mask can also be used on its own to solve issues such as presented in #8794.

The terminology mask comes from Reppy et al's work on interrupts in SML (https://people.cs.uchicago.edu/~jhr/papers/1990/cornell-tr-1144.pdf) and GHC (https://www.microsoft.com/en-us/research/publication/asynchronous-exceptions-haskell-3/). There is some delta between the paper and actual GHC. GHC has low-level scoped combinators mask (for interruptible mask) and uninterruptibleMask, that both provide an additional argument unmask, and which are then used to define higher-level abstractions such as bracket. We discuss the distinction interruptible/uninterruptible mask, and the unnecessity of unmask below.

Sys.mask needs sometimes to be used in combination with the polling behaviour of OCaml to provide useful critical sections (cf. the various comments (* BEGIN ATOMIC *) to see this combination in the testsuite). (The clarification of this polling behaviour is outside of the scope of this PR, see "Critical sections" at the end.)

We treat the systhread tick (a pseudo-signal which causes a context-switch every 50ms with system threads) as a special case. In principle, resource acquisition and release can call blocking system calls, in which case one wants to switch context (as performed inside caml_enter_blocking_section). So reacting to the systhread tick, which never interrupts the running thread with an exception, is always valid inside the mask. On the opposite side, I thought of the possible consequences of blocking the systhread tick inside the mask, while necessarily keeping context-switch inside caml_enter_blocking_section. Programmers might then be tempted to use Sys.mask to control the context-switching behaviour to ensure that no asynchronous or concurrent code runs at all (e.g. for races on mutable data). But this would be offering a dangerous interface for this job, given that it gives relevance to whether a function calls or not caml_enter_blocking_section (transitively), something that should remain an implementation detail permitted to change over time. The solution retained is to never block the systhread tick inside Sys.mask, and in addition to offer for discussion an optional Sys.nonpreemptible_mask better designed for that purpose (see below).

vs. unmask

By design, an unmask combinator as suggested in Dolan et al. is not proposed with this PR. Instead of unmask, the proof-of-concept with_resource is implemented by combining masking with OCaml's predictable polling behaviour. The latter ought to be strengthened and simplified with an explicit construct to control polling points, currently missing. But I have found that this technique makes reasoning about polling and resource safety easy, whereas there are things that I do not see how to do with an unmask combinator alone easily, like implementing the moving of resources in a way that is async-safe. Things like making a tail call after moving a resource are likely to be impossible with unmask on its own due to its scoped nature.

Non-raising handlers

As explained, we treat the systhread tick and blocking sections similarly since they are similar from a context-switching behaviour. Since the signal handler does not raise exceptions, it is allowed to run inside masks. This is implemented using a new notion of non-raising signal handler.

For backwards-compatibility reasons (avoiding to change the definition of Sys.signal_behavior), the interface to install a non-raising signal handler is not exposed via the Sys module, but remains available via a C primitive.

This could be left for explicitly undocumented low-level use, or one could decide on the contrary to have more kinds of non-raising callbacks that are allowed to run inside a mask in the future. (It could be desirable for instance to run finalisers that do simple release inside uninterruptible masks, since it would be realistic to write them carefully so that they do not raise, but an audit of finalisers in the wild has shown that changing this for all finalisers would be a largely-breaking change, so I did not explore this path.)

Non-preemptible mask

I propose for discussion the natural extension of Sys.mask to a variant Sys.nonpreemptible_mask, which disables all callbacks and context-switches for the duration of the scope. It offers the guarantee guarantees that no OCaml code runs asynchronously or concurrently. Accepting or not this part of the PR is matter of including or not the relevant commits.

One lesson of OCaml's cooperative concurrency libraries is that some people find it easier to control where context switches may happen, in the context of concurrency for IO. In addition, some OCaml code, for instance core's nano_mutex, take advantage of the absence of context switching and asynchronous code in allocation-free code to program efficient concurrency primitives. Sys.nonpreemptible_mask extends this capability to code that may allocate. On the one hand, avoiding allocations seems to have been sufficient for this purpose so far. On the other hand, OCaml programmers might have some experience of current limitations to report, or could find some other creative uses.

The programmer would rather avoid calling blocking calls inside Sys.nonpreemptible_mask, and the failure mode can be discussed. For simplicity, this PR unconditionally fails the call to caml_enter_blocking_section with an exception. While it is much better to unconditionally fail than to silently fail (by doing the context-switch), it is still desirable to keep an implementation detail the fact of whether a function transitively asks to release the runtime lock, and it would be possible to fail with an exception on some blocking calls whilst allowing calls to caml_enter_blocking_section in general without doing the context-switch. Depending on discussions, this would be done in a later PR.

vs. Interruptible mask

GHC also implements an "interruptible mask" in which asynchronous exceptions are masked but blocking operations may still be interrupted. The interruptible mask sounds like the appropriate choice for resource acquisition, during which, unlike resource release, exceptions are allowed if they can be expected (strong exception-safety), and one is likely to find blocking operations during acquisition that need to be interruptible.

However, this is more a question of library design for explicitly-interruptible blocking operations. During the work on EINTR-based signals #9722, we have seen that there are not many operations in the standard library that both explicitly poll on EINTR and then restart if no signal handler raises. For many blocking calls, running inside an uninterruptible mask will simply cause an Unix(EINTR,_,_) exception that the caller must catch, who can then decide to poll explicitly from OCaml. (This will require an explicit polling function that ignores the mask, not yet included.) For the remaining ones it might be appropriate to think of another interface to request explicit polling. This is less a question about the semantics of masking, and I prefer to postpone it to a later discussion (such as #8962) where it will still be possible to add an interruptible mask if it turns out to be needed.

(Thanks to @stedolan for an extensive discussion on this point.)

4. Implementation

When an asynchronous callback needs to run, action_pending in signals.c is set, and the runtime is notified via caml_notify_action. Notifying the program currently means setting young_limit to young_alloc_end so that the next allocation triggers the gc. To make masking efficient, one does not change how polling is done, but how the program is notified. Only if the new Caml_state->mask_asynchronous_callbacks is below the value of action_pending does the runtime notify the program. The program is also notified when unmasking if needed. Various levels of masking are provided, depending on whether they also delay non-raising callbacks.

More details are given in the commit logs.

Testing improvements

Testing the correct polling behaviour in OCaml is complicated. Tests based on sending signals from one thread to another are slow, brittle and nondeterministic. A few things are added to simplify testing, some primitives and notably some arrangements are made so that it is possible to re-record a signal from its own signal handler, allowing to explore all possible polling locations in a deterministic way.

Critical sections

The implementation of Sys.mask shows several critical sections ((* BEGIN/END ATOMIC *)) that cannot currently be enforced in bytecode and flambda. They are there to specify that the masking extends to the receiving and the returning of values, a fact described in the documentation. This is a lesson from implementing the with_resource combinator: Sys.mask works better if it combines well with non-polling sections on the caller side. But to observe this property, one needs to be able to enforce such critical sections in the caller's end in the first place. In backends where such critical sections cannot be enforced, there is nobody to observe the property. Therefore, the specification is vacuously true. Correct uses of Sys.mask in the meanwhile either carefully inline or solve simpler problems (e.g. #8994).

young_limit_c

An inefficiency has been noticed in the current polling mechanism in C, leading to the introduction of Caml_state->young_limit_c. This is mostly orthogonal to this PR. This is needed in this PR only due to the decision to not export action_pending, but the fix stands on its own to solve a problem that is already there. Godbolt has been used to check that the changes to the allocation macro do not affect generated code (more details in the commit log).

toots · 2019-09-30T04:35:30Z

That looks very interesting. I'll try to test it when I get a chance, thank you.

gadmm · 2019-12-17T14:41:36Z

Quick update: I am now convinced that we should introduce a distinction between raising and non-raising signal handlers. My next step is to write a RFC so that we can agree on the design before I implement it.

gadmm · 2020-07-15T00:22:06Z

I have updated the description to match the new (and rebased) implementation. It distinguishes (possibly) raising from "non-raising" signal handlers. This PR is not yet complete; I need to see where the interruption of blocking system calls is headed (it is not possible to put the PR back into draft mode). It currently probably misses another kind of mask, that still polls during blocking sections (i.e. that turns automatic polling into explicit polling), like in GHC, for which I think there can be some use in the acquisition of resources. Whether this makes sense will depend on EINTR-related patches.

In the meanwhile, I remember that the strongest mask Thread.uninterruptible that disables non-raising signal handlers (systhread's yield) was something that we mentioned with @stedolan. Stephen, do I remember correctly that you have some use-cases for this?

Lastly, I mentioned an RFC, but I managed to avoid breaking changes I had in mind, so I hope that an RFC will not be necessary.

gadmm · 2020-12-02T19:45:38Z

I think the PR is mature and ready for review. (It can interest @stedolan, @jhjourdan; also @lpw25 for the design part.)

Rebased on top of the numerous changes to the related code.
Changed terminology for a better one.
Solved remaining questions about the correctness of critical sections and the (lack of) need of an interruptible mask à la Haskell.
Updated the description.

The rebasing was a bit tedious, and I know that there is parallel work on the semantics of signals on the multicore side. To avoid further divergence of code, this PR focuses on the semantics of masking, while higher-level details (e.g. #8962) will wait for later. The delay until now was mostly due to investigating the question of what to do about blocking system calls inside uninterruptible masks, related to the work on EINTR behaviour, which turned out to be orthogonal to the current PR. (See the note on interruptible masks.)

Comments welcome.

gasche

I read the PR description (it is detailed and interesting, thanks, but there are parts I don't understand, in particular the comparison with other approaches in the "design" section: I don't remember what with_resource is and how it relates to unmask, and I don't understand the "Non-preemptible mask" section). I also started looking at the code, here are a few review comments.

gasche · 2020-12-25T21:38:25Z

runtime/signals_nat.c

+#if defined(CONTEXT_PC) && defined(CONTEXT_YOUNG_LIMIT)
+  if (caml_find_code_fragment_by_pc((char *) CONTEXT_PC) != NULL)
+    CONTEXT_YOUNG_LIMIT = (context_reg) Caml_state->young_limit;
+#endif


Naive question about the CONTEXT_YOUNG_LIMIT stuff: if I understand correctly, including this code in caml_notify_action means that it will now run in the two calls to caml_notify_action in signals.c, which previously did not do this stuff, they would only update Caml_state->young_limit. Why is this correct? What is this code doing? (It looks potentially expensive.) Will it be trivially disabled or a no-op in signals.c for a reason I don't see?

On the one hand, the caching of young_limit in a register is going away with #9876, so the highlighted code will go away.

On the other hand, this code was wrong to begin with. CONTEXT_PC and CONTEXT_YOUNG_LIMIT can only be used from a signal handler defined using DECLARE_SIGNAL_HANDLER, because the latter refer to function parameters of the former.

Thanks for the clarification; this code should be lifted out of the function and back into the callsite where it was used.

Yes, I realised that this old code of mine was fishy, but it was meant to disappear with that other PR so I did not pay more attention to it. Sorry for this unnecessary (and wrong) commit.

I removed the erroneous fix.

gasche · 2020-12-25T21:42:13Z

runtime/signals.c

  caml_raise_if_exception(exn);
 }

+CAMLprim value caml_sys_record_signal(value signal_number)


If I understand correctly, this function pretends that a certain signal is received. This is meant for testing. Both these informations could be given in a comment on the function declaration (they are implicit in the commit, but code reader from the future may not look at the commit first to understand why this function is there).

gasche · 2020-12-25T22:10:28Z

runtime/signals.c

  Caml_state->requested_major_slice = 1;
  caml_set_action_pending();
+  // update Caml_state->young_limit_c
+  caml_update_young_limit();


I find this code (and the code below) rather fishy. If I read correctly:

first we set requested_major_slice = 1

then set_action_pending will set something_to_do = 1 and call notify_action, which sets young_limit = young_alloc_end

then update_young_limit will see that requested_major_slice==1 and set young_limit_c = young_alloc_end and then (vacuously) young_limit = young_limit_c

There is a lot of redundancy in here, and the treatment of young_limit_c in update_young_limit feels rather ad-hoc.

I would propose the following as a simpler approach, but I am not sure that it is correct:

Define a caml_set_gc_pending function as a simpler version of caml_set_action_pending for GC requests. This function sets (something_to_do=1 and) young_limit_c and young_limit to young_alloc_end unconditionally.

Use it here instead of caml_set_action_pending (no need to separately update the young limits).

Remove the conditional in caml_update_young_limit introduced by the current patch (the new case is not used anymore, iiuc.) and only add an update to synchronize young_limit and young_limit_c.

Ideally, caml_update_young_limit would be the reference function to set the young limits to their correct values. The new test inside caml_update_young_limit intends to make the invariants of this functions less brittle, otherwise one has to be careful not to call it in places where a gc might be pending as it will forget the gc request. As a matter of fact, this seems to have been respected until recently, but currently if you call certain memprof functions after requesting a GC then the runtime could forget your request. Not a huge bug, but it shows that the invariants were too brittle.

Given that, it seemed like premature optimisation to me to introduce a specialised function to do the effect of update_young_limits without the redundancy. But since you seem to really dislike the detour, I tried to fix it differently.

I'm not sure where you "tried to fix it differently", is it in the fixup commit 56af457 ? I'm slightly confused because this seems to be a simple commit factorizing two lines of code (I see no detour removal in that patch), but then it looks like at this point the detour is gone anyway (it contains code that I have not reviewed yet).

The detour is not a problem at runtime, but I find it confusing in terms of code comprehension: when we perform redundant operations it sometimes mean that the design could/should be sharpened.

My rebase mistake. The fixup commit was intended to show the fix, but instead the fix is already in the original commit (set action_pending by hand without calling notify_action) and the fixup does a small refactoring.
The end version should indeed be less confusing for readers.

gadmm · 2020-12-27T20:11:22Z

I read the PR description (it is detailed and interesting, thanks, but there are parts I don't understand, in particular the comparison with other approaches in the "design" section: I don't remember what with_resource is and how it relates to unmask, and I don't understand the "Non-preemptible mask" section).

Thanks for the feedback, I have reworked the PR description. Let me know if it is better.

gasche

More review comments (halfway through the "Implement Caml_state->mask_async_callbacks" commit).

runtime/signals.c

gasche · 2020-12-28T07:35:04Z

runtime/minor_gc.c

-    /* If not, then empty the minor heap, and check again for async
-       callbacks. */
+    /* If not, then empty the minor heap, and check again in case of
+       possible new memprof callbacks in the queue. */


Are memprof callbacks really the only sort of callbacks that could get queued by caml_gc_dispatch? I would think of signals and finalisers as well.

Finalisers could be queued, but this behaviour of running them in the loop did not exist before memprof and was motivated by running memprof callbacks as soon as possible in general (AFAIU).

If I understand correctly, the motivation for checking again is memprof callbacks, but what we actually check in the queue contains more kinds of async callbacks. (If we were to add a new sort of asynchronous callbacks in the future, would it also be checked in the queue? I think yes.) I thus find the previous comment more informative than the one you propose, which to me suggests (misleadingly?) that only memprof callbacks are checked / run after emptying the minor queue.

I'll have a look

Changes

gasche · 2020-12-28T07:38:10Z

otherlibs/systhreads/st_stubs.c

  curr_thread->backtrace_buffer = Caml_state->backtrace_buffer;
  curr_thread->backtrace_last_exn = Caml_state->backtrace_last_exn;
  caml_memprof_leave_thread();
+  curr_thread->mask_async_callbacks = Caml_state->mask_async_callbacks;


Very nitpicky: to me mask reads as a verb in mask_async_callbacks and as a noun in async_callbacks_mask.

It is indeed a verb.

runtime/caml/signals.h

gasche · 2020-12-28T09:28:24Z

runtime/caml/signals.h


 CAMLextern int caml_check_pending_actions (void);
-/* Returns 1 if there are pending actions, 0 otherwise. */
+/* Returns 1 if there are actions to do at current mask level, 0


very nitpicky: breaking the line right after the comma would be nicer

I leave it up to my emacs macro...

runtime/memprof.c

gasche · 2020-12-28T09:33:41Z

runtime/memprof.c

-   minor heap, from the C code (the handling is different when called
-   from natively compiled OCaml code). */
+   minor heap. Execute allocation callbacks immediately unless if
+   called from C code or if asynchronous callbacks are masked. */


Naive question: could we have a simpler system if we just considered that going to C masks asynchronous callbacks, to have a single notion instead of two? In particular:

are FROM_C / FROM_CAML equivalent to assuming a particular mask level on the C side, or do they do other things?

could we change young_limit_c to instead be young_limit_for_mask_level_<foo>, does that suggest a different code organization, and would we also get performance benefits when running masked OCaml code?

Thinking more about this: even if we consider that young_limit_c is in fact young_limit_no_large_callback, a key point is that we have static knowledge that we are "within C code" so we statically know which limit to check. On the other hand, OCaml code running under a mask cannot (in general) know it without a dynamic check (or at least some offset computation to get the limit for the current mask level), which would be a performance regression in the limit-check hot path. This suggests that a conceptual unification may be possible, but that it does not magically improve the implementation.

(Note: we could have a check for the mask level at the entry of the cold path of Alloc_small, but that still probably is not worth it, as it increases code size for an optimization that would fire rarely.)

The C functions in the runtime are not considered as running in a mask: they do poll, they just decouple the allocations from the safe polling locations. In particular we want to control their polling location.

As you explain it, this bug does not appear in OCaml code due to the dynamic nature of the mask and how it is implemented (whereby young_limit is always young_limit_for_the_current_mask_level).

That being said, I understand where the questions come from and I asked them myself when I was implementing it. To see how one arrives at this organisation, one has to start at the thought experiment at the end of the Motivation section of the PR description, where control on polling was given by a static annotation on OCaml functions. In that experiment, functions instructed to not poll would need to have the allocations replaced by ones that compare to the equivalent of young_limit_c, and programmers could insert polls by hand where they see fit. Then, this would be the OCaml equivalent of what is currently done via the C API. So young_limit_c really means young_limit_but_do_not_poll. The conclusion of the thought experiment is that the purpose of a runtime flag compared to a static annotation is code re-use, so: - a way is devised to avoid this performance bug in the absence of static control (the same compiled code must perform allocations with and without mask), - existing polling locations in the C parts of the runtime must obey the mask (thus, running a C function must be different from running under a mask). This is how you get to the current organisation.

gasche · 2020-12-28T09:37:42Z

runtime/memprof.c

      if (t_idx == Invalid_index) continue;
-      res = run_alloc_callback_exn(t_idx);
+      if (Caml_state->mask_async_callbacks == CAML_MASK_NONE)
+        res = run_alloc_callback_exn(t_idx);


If I understand correctly, the rest of the loop-iteration code below is useless if the callback is not immediately run, so you should be able to just continue in the masked case.

Note: to me checking == CAML_MASK_NONE corresponds to a fragile assumption the mask lattice, that there is no intermediary mask between NONE and masks_memprof_callbacks. I would rather write the code as < memprof-callback-mask.

Except for the if (!stopped) ... at the end, right?

memprof.c and signals.c already have tightly coupled assumptions, which is why it is fine to test == CAML_MASK_NONE.

I now understood that you meant replacing == CAML_MASK_NONE with < CAML_MASK_UNINTERRUPTIBLE, not introducing a new variable for generality.

Except for the if (!stopped) ... at the end, right?

Yes, you are right, my suggestion above changes the behavior compared to your patch. But I am not sure that either one (your patch or my suggestion) is correct, and I find it difficult to understand what should be done here. For example, with your version, res is not updated when the mask is set, so it could retain an exceptional value that it got from a previous iteration of the loop, and break on line 897 due to an old/stale value. Now we would normally assume that the mask will not be set during this loop (or locally inside a callback and unset again before the next iteration), I don't know if this can happen.

If I understand correctly, what you want to do about the interaction of masking of memprof is to keep sampling as usual, just not run the callbacks. This sounds sensible but I am skeptical that the current implementation is correct (or clear enough for me to check its correctness). We should probably ask @jhjourdan or @stedolan to look at this part of the change.

Yes, the reasoning is that this way it behaves similarly to the call to maybe_track_block in the case of an alloc from C, near the beginning of the function.

res is not an issue: it cannot be an exceptional value or it would have exited the loop (and although it is not rooted, there is never any heap allocation between the moment it is set and the moment it is read).

We are allowed to assume that calls to caml_mask/caml_unmask are well bracketed, but I do not think this is relevant here.

gasche · 2020-12-28T09:43:39Z

runtime/caml/signals.h

+
+   This gives three possible values for
+   [Caml_state->mask_async_callbacks]: CAML_MASK_NONE,
+   CAML_MASK_UNINTERRUPTIBLE and CAML_MASK_NONPREEMPTIBLE.


Beginner comment: to me "to interrupt" and "to preempt" are mostly synonymous, so I must say that I find the naming a bit confusing. Are you sure that it is very clear to more knowledgeable people? Would you maybe have alternate naming suggestions?

(For example I wondered if we could call short non-raising callbacks "short callbacks" or "small callbacks", and asynchronous callbacks "long callbacks" or "large callbacks", and whether that would suggest names here.)

I now made sure to define interrupt at the beginning of that documentation comment. In SML parlance, an interrupt is a special kind of exception that cancels the current thread asynchronously.

I have plans to eventually name the last level differently if the notion of non-preemptible mask does not make it.

I do not like small/long, which do not mean much. What matters is whether the current thread can be cancelled at that point (affine vs. linear), which is what raising vs. non-raising reflects to me.

gadmm · 2020-12-28T19:32:43Z

Thanks, I have addressed the latest round of comments.

gadmm · 2020-12-28T22:17:39Z

The last change was due to a mistake in the commit "Add Sys.nonpreemptible_mask" (not yet reviewed). What I wanted to do is a bit more complicated to implement so I pushed a simpler version for the sake of the discussion. I have updated the PR description accordingly.

If an action is pending in C code, young_limit forces to take the expensive path every time, until the pending action is cleared. To avoid this, we check against a new limit young_limit_c instead of young_limit in Alloc_small_aux from now on, used to record the next implicit or requested major slice and minor collection. To make sure memprof samples the blocks, we record caml_memprof_young_trigger in young_limit_c. * I checked with Godbolt that the new branching inside Alloc_small_limit is erased at compilation time. This is true for all compilers proposed by Godbolt (including old ones) except ppci and cc65. In addition, all compilers give identical output as if there was no branching with -O0, except msvc which sometimes required -O1 to remove a noop. * There is a new call to `caml_notify_action` inside `caml_alloc_small_dispatch` via `caml_set_action_pending` that fixes the behaviour where action_pending might be set but the program not notified when returning to OCaml code. This is also used to stop exporting caml_something_to_do later on. Without the rest of this commit, the change would slow down all C allocations up to the next explicit poll.

* Preparations for masking. * `caml_something_to_do` was declared as internal but was exported and used in ocamlcc (among all opam packages). This usage in ocamlcc is superseded by the official interface `caml_process_pending_actions_exn`, and `caml_check_pending_actions`.

This is a flag to control the delaying of asynchronous callbacks (signal handlers, finalisers and memprof callbacks). We distinguish two kinds of asynchronous actions: - unrestricted asynchronous callbacks: possibly-raising signal handlers, memprof callbacks and finalisers (including GC alarms). - non-raising actions, such as non-raising signal handlers (e.g. systhread's yield). (The latter is set as a non-raising handler in an ulterior commit.) Masking temporarily delays the execution of asynchronous callbacks. New functions caml_mask and caml_unmask to control which kind of asynchronous callbacks are delayed.

So, it will no longer be delayed by Sys.mask. For simplicity and to avoid changing the definition of Sys.signal_behavior, "non-raising" signal handlers are not exposed through the Sys module, but are accessible through the new primitive.

This polling was introduced at 51c55b2 to "preserve the semantics of the program w.r.t. exceptions". The implementation was a bit hackish since it needed to poll again right after executing callbacks. This is no longer needed for this purpose now that minor allocations poll in bytecode (since 4.10). Furthermore, a signal arising right before the instruction cannot be distinguished from a signal arising just after. Removing this polling simplifies the code, but also prevents looping when recording a signal from a signal handler (which is very useful to purposefully explore all polling locations of a function, e.g. in testing, as used in subsequent commits).

Remove caml_process_pending_signals_exn, replaced by calls to actions. Then remove signals_are_pending which was used as a cache to make the test inside caml_enter_blocking_section efficient. As a side-effect, this permits to record a signal from a signal handler, very useful to have robust tests. We add such a test as a control for tests in subsequent commits.

It tests a possible future Fun.with_resource similar to Haskell's bracket.

gasche · 2023-03-03T14:02:11Z

@stedolan is this work that you would like to eventually seen integrated in OCaml?

(Obviously this PR was against the OCaml 4 runtime and would need to be rewritten on top of of OCaml 5. But before considering doing this, it would be useful to know if the feature has any chance of being integrated at all. This requires work, and who would do it?)

stedolan · 2024-03-12T11:23:28Z

@lpw25 and I have been talking recently with @gadmm about this PR and masking / async interrupts in general, and we're at least close to agreeing how it should work! We're hoping to make progress on this PR soon.

This was referenced Sep 22, 2019

[RFC] With_resource #8962

Draft

API to prevent automatic Gc collection #8794

Closed

gadmm mentioned this pull request Sep 29, 2019

Resource-safe C interface, part 2 #8997

Open

gadmm force-pushed the Sys.mask branch from 3c240e3 to 79f9d4c Compare September 29, 2019 21:26

gadmm mentioned this pull request Oct 1, 2019

Guarantee that allocation functions do no trigger callbacks when called from C #8691

Merged

This was referenced Oct 9, 2019

Fixes to #8691 #9027

Merged

Resource-safe C interface, part 1 (the 4.10 backwards-compatibility edition) #8993

Merged

gadmm force-pushed the Sys.mask branch from 79f9d4c to ecf9a8c Compare June 20, 2020 00:52

gadmm mentioned this pull request Jun 29, 2020

EINTR-based signals, again #9722

Merged

1 task

gadmm mentioned this pull request Oct 13, 2020

Memtrace does not play well with asynchronous exceptions janestreet/memtrace#2

Open

gadmm force-pushed the Sys.mask branch 4 times, most recently from 6f4754d to 7ca0abd Compare December 2, 2020 18:14

gadmm force-pushed the Sys.mask branch from 4914332 to 9ae4a61 Compare December 3, 2020 17:11

gadmm mentioned this pull request Dec 3, 2020

Safepoints #10039

Merged

gasche reviewed Dec 25, 2020

View reviewed changes

gasche reviewed Dec 28, 2020

View reviewed changes

gadmm force-pushed the Sys.mask branch 2 times, most recently from 56af457 to f730981 Compare December 28, 2020 18:50

gadmm force-pushed the Sys.mask branch from f730981 to dec7d9e Compare December 28, 2020 21:15

gadmm force-pushed the Sys.mask branch from dec7d9e to 2bbc124 Compare December 28, 2020 22:30

Factor async notification delivery logic with notify_action

667008d

gadmm added 15 commits January 20, 2021 23:00

Primitive to test signals

bbbdf10

[minor] Documentation

2eee8f5

Register systhread's tick as non-raising

b8f719c

So, it will no longer be delayed by Sys.mask. For simplicity and to avoid changing the definition of Sys.signal_behavior, "non-raising" signal handlers are not exposed through the Sys module, but are accessible through the new primitive.

Prevent all asynchronous exceptions during uncaught exception handling

6ec6922

Function Sys.mask to run with asynchronous callbacks delayed

5a5d89f

Import tests from the future

55bd7c5

It tests a possible future Fun.with_resource similar to Haskell's bracket.

Add Sys.nonpreemptible_mask

1e3e4ea

Adapt tests for bytecode

ee80da4

Fix tests for Windows?

ab4bd7f

gasche's comments

0480c9c

gadmm force-pushed the Sys.mask branch from 2bbc124 to 0480c9c Compare January 20, 2021 22:07

gadmm mentioned this pull request Jun 25, 2021

Add [@poll error] attribute #10462

Merged

gadmm mentioned this pull request Jul 1, 2022

Exception raising Stdlib.at_exit callback breaks Stdlib.exit #11221

Open

gasche mentioned this pull request Mar 31, 2023

Meta-issue on asynchronous action handling in multicore #10915

Closed

20 tasks

gadmm mentioned this pull request Jul 22, 2023

Fix async exception handling inside Domain.spawn #12409

Merged

gasche mentioned this pull request Mar 19, 2024

Force all domain threads to exit before the main thread #13010

Closed

Conversation

gadmm commented Sep 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Motivation

2. Overview

3. Design

vs. unmask

Non-raising handlers

Non-preemptible mask

vs. Interruptible mask

4. Implementation

Testing improvements

Critical sections

young_limit_c

Uh oh!

toots commented Sep 30, 2019

Uh oh!

gadmm commented Dec 17, 2019

Uh oh!

gadmm commented Jul 15, 2020

Uh oh!

gadmm commented Dec 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gasche left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gasche Dec 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gadmm Dec 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gadmm commented Dec 27, 2020

Uh oh!

gasche left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gadmm commented Sep 22, 2019 •

edited

Loading

gadmm commented Dec 2, 2020 •

edited

Loading

gasche Dec 25, 2020 •

edited

Loading

gadmm Dec 28, 2020 •

edited

Loading

gadmm Dec 28, 2020 •

edited

Loading

gadmm Dec 28, 2020 •

edited

Loading