Conversation
|
That looks very interesting. I'll try to test it when I get a chance, thank you. |
|
Quick update: I am now convinced that we should introduce a distinction between raising and non-raising signal handlers. My next step is to write a RFC so that we can agree on the design before I implement it. |
|
I have updated the description to match the new (and rebased) implementation. It distinguishes (possibly) raising from "non-raising" signal handlers. This PR is not yet complete; I need to see where the interruption of blocking system calls is headed (it is not possible to put the PR back into draft mode). It currently probably misses another kind of mask, that still polls during blocking sections (i.e. that turns automatic polling into explicit polling), like in GHC, for which I think there can be some use in the acquisition of resources. Whether this makes sense will depend on EINTR-related patches. In the meanwhile, I remember that the strongest mask Lastly, I mentioned an RFC, but I managed to avoid breaking changes I had in mind, so I hope that an RFC will not be necessary. |
6f4754d to
7ca0abd
Compare
|
I think the PR is mature and ready for review. (It can interest @stedolan, @jhjourdan; also @lpw25 for the design part.)
The rebasing was a bit tedious, and I know that there is parallel work on the semantics of signals on the multicore side. To avoid further divergence of code, this PR focuses on the semantics of masking, while higher-level details (e.g. #8962) will wait for later. The delay until now was mostly due to investigating the question of what to do about blocking system calls inside uninterruptible masks, related to the work on EINTR behaviour, which turned out to be orthogonal to the current PR. (See the note on interruptible masks.) Comments welcome. |
gasche
left a comment
There was a problem hiding this comment.
I read the PR description (it is detailed and interesting, thanks, but there are parts I don't understand, in particular the comparison with other approaches in the "design" section: I don't remember what with_resource is and how it relates to unmask, and I don't understand the "Non-preemptible mask" section). I also started looking at the code, here are a few review comments.
runtime/signals_nat.c
Outdated
| #if defined(CONTEXT_PC) && defined(CONTEXT_YOUNG_LIMIT) | ||
| if (caml_find_code_fragment_by_pc((char *) CONTEXT_PC) != NULL) | ||
| CONTEXT_YOUNG_LIMIT = (context_reg) Caml_state->young_limit; | ||
| #endif |
There was a problem hiding this comment.
Naive question about the CONTEXT_YOUNG_LIMIT stuff: if I understand correctly, including this code in caml_notify_action means that it will now run in the two calls to caml_notify_action in signals.c, which previously did not do this stuff, they would only update Caml_state->young_limit. Why is this correct? What is this code doing? (It looks potentially expensive.) Will it be trivially disabled or a no-op in signals.c for a reason I don't see?
There was a problem hiding this comment.
On the one hand, the caching of young_limit in a register is going away with #9876, so the highlighted code will go away.
On the other hand, this code was wrong to begin with. CONTEXT_PC and CONTEXT_YOUNG_LIMIT can only be used from a signal handler defined using DECLARE_SIGNAL_HANDLER, because the latter refer to function parameters of the former.
There was a problem hiding this comment.
Thanks for the clarification; this code should be lifted out of the function and back into the callsite where it was used.
There was a problem hiding this comment.
Yes, I realised that this old code of mine was fishy, but it was meant to disappear with that other PR so I did not pay more attention to it. Sorry for this unnecessary (and wrong) commit.
There was a problem hiding this comment.
I removed the erroneous fix.
| caml_raise_if_exception(exn); | ||
| } | ||
|
|
||
| CAMLprim value caml_sys_record_signal(value signal_number) |
There was a problem hiding this comment.
If I understand correctly, this function pretends that a certain signal is received. This is meant for testing. Both these informations could be given in a comment on the function declaration (they are implicit in the commit, but code reader from the future may not look at the commit first to understand why this function is there).
runtime/signals.c
Outdated
| Caml_state->requested_major_slice = 1; | ||
| caml_set_action_pending(); | ||
| // update Caml_state->young_limit_c | ||
| caml_update_young_limit(); |
There was a problem hiding this comment.
I find this code (and the code below) rather fishy. If I read correctly:
- first we set
requested_major_slice = 1 - then
set_action_pendingwill setsomething_to_do = 1and callnotify_action, which setsyoung_limit = young_alloc_end - then
update_young_limitwill see thatrequested_major_slice==1and setyoung_limit_c = young_alloc_endand then (vacuously)young_limit = young_limit_c
There is a lot of redundancy in here, and the treatment of young_limit_c in update_young_limit feels rather ad-hoc.
I would propose the following as a simpler approach, but I am not sure that it is correct:
- Define a
caml_set_gc_pendingfunction as a simpler version ofcaml_set_action_pendingfor GC requests. This function sets (something_to_do=1and)young_limit_candyoung_limittoyoung_alloc_endunconditionally. - Use it here instead of
caml_set_action_pending(no need to separately update the young limits). - Remove the conditional in
caml_update_young_limitintroduced by the current patch (the new case is not used anymore, iiuc.) and only add an update to synchronizeyoung_limitandyoung_limit_c.
There was a problem hiding this comment.
Ideally, caml_update_young_limit would be the reference function to set the young limits to their correct values. The new test inside caml_update_young_limit intends to make the invariants of this functions less brittle, otherwise one has to be careful not to call it in places where a gc might be pending as it will forget the gc request. As a matter of fact, this seems to have been respected until recently, but currently if you call certain memprof functions after requesting a GC then the runtime could forget your request. Not a huge bug, but it shows that the invariants were too brittle.
Given that, it seemed like premature optimisation to me to introduce a specialised function to do the effect of update_young_limits without the redundancy. But since you seem to really dislike the detour, I tried to fix it differently.
There was a problem hiding this comment.
I'm not sure where you "tried to fix it differently", is it in the fixup commit 56af457 ? I'm slightly confused because this seems to be a simple commit factorizing two lines of code (I see no detour removal in that patch), but then it looks like at this point the detour is gone anyway (it contains code that I have not reviewed yet).
The detour is not a problem at runtime, but I find it confusing in terms of code comprehension: when we perform redundant operations it sometimes mean that the design could/should be sharpened.
There was a problem hiding this comment.
My rebase mistake. The fixup commit was intended to show the fix, but instead the fix is already in the original commit (set action_pending by hand without calling notify_action) and the fixup does a small refactoring.
The end version should indeed be less confusing for readers.
Thanks for the feedback, I have reworked the PR description. Let me know if it is better. |
gasche
left a comment
There was a problem hiding this comment.
More review comments (halfway through the "Implement Caml_state->mask_async_callbacks" commit).
runtime/minor_gc.c
Outdated
| /* If not, then empty the minor heap, and check again for async | ||
| callbacks. */ | ||
| /* If not, then empty the minor heap, and check again in case of | ||
| possible new memprof callbacks in the queue. */ |
There was a problem hiding this comment.
Are memprof callbacks really the only sort of callbacks that could get queued by caml_gc_dispatch? I would think of signals and finalisers as well.
There was a problem hiding this comment.
Finalisers could be queued, but this behaviour of running them in the loop did not exist before memprof and was motivated by running memprof callbacks as soon as possible in general (AFAIU).
There was a problem hiding this comment.
If I understand correctly, the motivation for checking again is memprof callbacks, but what we actually check in the queue contains more kinds of async callbacks. (If we were to add a new sort of asynchronous callbacks in the future, would it also be checked in the queue? I think yes.) I thus find the previous comment more informative than the one you propose, which to me suggests (misleadingly?) that only memprof callbacks are checked / run after emptying the minor queue.
| curr_thread->backtrace_buffer = Caml_state->backtrace_buffer; | ||
| curr_thread->backtrace_last_exn = Caml_state->backtrace_last_exn; | ||
| caml_memprof_leave_thread(); | ||
| curr_thread->mask_async_callbacks = Caml_state->mask_async_callbacks; |
There was a problem hiding this comment.
Very nitpicky: to me mask reads as a verb in mask_async_callbacks and as a noun in async_callbacks_mask.
|
|
||
| CAMLextern int caml_check_pending_actions (void); | ||
| /* Returns 1 if there are pending actions, 0 otherwise. */ | ||
| /* Returns 1 if there are actions to do at current mask level, 0 |
There was a problem hiding this comment.
very nitpicky: breaking the line right after the comma would be nicer
There was a problem hiding this comment.
I leave it up to my emacs macro...
| minor heap, from the C code (the handling is different when called | ||
| from natively compiled OCaml code). */ | ||
| minor heap. Execute allocation callbacks immediately unless if | ||
| called from C code or if asynchronous callbacks are masked. */ |
There was a problem hiding this comment.
Naive question: could we have a simpler system if we just considered that going to C masks asynchronous callbacks, to have a single notion instead of two? In particular:
- are FROM_C / FROM_CAML equivalent to assuming a particular mask level on the C side, or do they do other things?
- could we change
young_limit_cto instead beyoung_limit_for_mask_level_<foo>, does that suggest a different code organization, and would we also get performance benefits when running masked OCaml code?
There was a problem hiding this comment.
Thinking more about this: even if we consider that young_limit_c is in fact young_limit_no_large_callback, a key point is that we have static knowledge that we are "within C code" so we statically know which limit to check. On the other hand, OCaml code running under a mask cannot (in general) know it without a dynamic check (or at least some offset computation to get the limit for the current mask level), which would be a performance regression in the limit-check hot path. This suggests that a conceptual unification may be possible, but that it does not magically improve the implementation.
(Note: we could have a check for the mask level at the entry of the cold path of Alloc_small, but that still probably is not worth it, as it increases code size for an optimization that would fire rarely.)
There was a problem hiding this comment.
-
The C functions in the runtime are not considered as running in a mask: they do poll, they just decouple the allocations from the safe polling locations. In particular we want to control their polling location.
-
As you explain it, this bug does not appear in OCaml code due to the dynamic nature of the mask and how it is implemented (whereby
young_limitis alwaysyoung_limit_for_the_current_mask_level).
That being said, I understand where the questions come from and I asked them myself when I was implementing it. To see how one arrives at this organisation, one has to start at the thought experiment at the end of the Motivation section of the PR description, where control on polling was given by a static annotation on OCaml functions. In that experiment, functions instructed to not poll would need to have the allocations replaced by ones that compare to the equivalent of young_limit_c, and programmers could insert polls by hand where they see fit. Then, this would be the OCaml equivalent of what is currently done via the C API. So young_limit_c really means young_limit_but_do_not_poll. The conclusion of the thought experiment is that the purpose of a runtime flag compared to a static annotation is code re-use, so: - a way is devised to avoid this performance bug in the absence of static control (the same compiled code must perform allocations with and without mask), - existing polling locations in the C parts of the runtime must obey the mask (thus, running a C function must be different from running under a mask). This is how you get to the current organisation.
| if (t_idx == Invalid_index) continue; | ||
| res = run_alloc_callback_exn(t_idx); | ||
| if (Caml_state->mask_async_callbacks == CAML_MASK_NONE) | ||
| res = run_alloc_callback_exn(t_idx); |
There was a problem hiding this comment.
If I understand correctly, the rest of the loop-iteration code below is useless if the callback is not immediately run, so you should be able to just continue in the masked case.
Note: to me checking == CAML_MASK_NONE corresponds to a fragile assumption the mask lattice, that there is no intermediary mask between NONE and masks_memprof_callbacks. I would rather write the code as < memprof-callback-mask.
There was a problem hiding this comment.
Except for the if (!stopped) ... at the end, right?
memprof.c and signals.c already have tightly coupled assumptions, which is why it is fine to test == CAML_MASK_NONE.
There was a problem hiding this comment.
I now understood that you meant replacing == CAML_MASK_NONE with < CAML_MASK_UNINTERRUPTIBLE, not introducing a new variable for generality.
There was a problem hiding this comment.
Except for the
if (!stopped) ...at the end, right?
Yes, you are right, my suggestion above changes the behavior compared to your patch. But I am not sure that either one (your patch or my suggestion) is correct, and I find it difficult to understand what should be done here. For example, with your version, res is not updated when the mask is set, so it could retain an exceptional value that it got from a previous iteration of the loop, and break on line 897 due to an old/stale value. Now we would normally assume that the mask will not be set during this loop (or locally inside a callback and unset again before the next iteration), I don't know if this can happen.
If I understand correctly, what you want to do about the interaction of masking of memprof is to keep sampling as usual, just not run the callbacks. This sounds sensible but I am skeptical that the current implementation is correct (or clear enough for me to check its correctness). We should probably ask @jhjourdan or @stedolan to look at this part of the change.
There was a problem hiding this comment.
Yes, the reasoning is that this way it behaves similarly to the call to maybe_track_block in the case of an alloc from C, near the beginning of the function.
res is not an issue: it cannot be an exceptional value or it would have exited the loop (and although it is not rooted, there is never any heap allocation between the moment it is set and the moment it is read).
We are allowed to assume that calls to caml_mask/caml_unmask are well bracketed, but I do not think this is relevant here.
|
|
||
| This gives three possible values for | ||
| [Caml_state->mask_async_callbacks]: CAML_MASK_NONE, | ||
| CAML_MASK_UNINTERRUPTIBLE and CAML_MASK_NONPREEMPTIBLE. |
There was a problem hiding this comment.
Beginner comment: to me "to interrupt" and "to preempt" are mostly synonymous, so I must say that I find the naming a bit confusing. Are you sure that it is very clear to more knowledgeable people? Would you maybe have alternate naming suggestions?
(For example I wondered if we could call short non-raising callbacks "short callbacks" or "small callbacks", and asynchronous callbacks "long callbacks" or "large callbacks", and whether that would suggest names here.)
There was a problem hiding this comment.
I now made sure to define interrupt at the beginning of that documentation comment. In SML parlance, an interrupt is a special kind of exception that cancels the current thread asynchronously.
I have plans to eventually name the last level differently if the notion of non-preemptible mask does not make it.
I do not like small/long, which do not mean much. What matters is whether the current thread can be cancelled at that point (affine vs. linear), which is what raising vs. non-raising reflects to me.
56af457 to
f730981
Compare
|
Thanks, I have addressed the latest round of comments. |
|
The last change was due to a mistake in the commit "Add Sys.nonpreemptible_mask" (not yet reviewed). What I wanted to do is a bit more complicated to implement so I pushed a simpler version for the sake of the discussion. I have updated the PR description accordingly. |
If an action is pending in C code, young_limit forces to take the expensive path every time, until the pending action is cleared. To avoid this, we check against a new limit young_limit_c instead of young_limit in Alloc_small_aux from now on, used to record the next implicit or requested major slice and minor collection. To make sure memprof samples the blocks, we record caml_memprof_young_trigger in young_limit_c. * I checked with Godbolt that the new branching inside Alloc_small_limit is erased at compilation time. This is true for all compilers proposed by Godbolt (including old ones) except ppci and cc65. In addition, all compilers give identical output as if there was no branching with -O0, except msvc which sometimes required -O1 to remove a noop. * There is a new call to `caml_notify_action` inside `caml_alloc_small_dispatch` via `caml_set_action_pending` that fixes the behaviour where action_pending might be set but the program not notified when returning to OCaml code. This is also used to stop exporting caml_something_to_do later on. Without the rest of this commit, the change would slow down all C allocations up to the next explicit poll.
* Preparations for masking. * `caml_something_to_do` was declared as internal but was exported and used in ocamlcc (among all opam packages). This usage in ocamlcc is superseded by the official interface `caml_process_pending_actions_exn`, and `caml_check_pending_actions`.
This is a flag to control the delaying of asynchronous callbacks (signal handlers, finalisers and memprof callbacks). We distinguish two kinds of asynchronous actions: - unrestricted asynchronous callbacks: possibly-raising signal handlers, memprof callbacks and finalisers (including GC alarms). - non-raising actions, such as non-raising signal handlers (e.g. systhread's yield). (The latter is set as a non-raising handler in an ulterior commit.) Masking temporarily delays the execution of asynchronous callbacks. New functions caml_mask and caml_unmask to control which kind of asynchronous callbacks are delayed.
So, it will no longer be delayed by Sys.mask. For simplicity and to avoid changing the definition of Sys.signal_behavior, "non-raising" signal handlers are not exposed through the Sys module, but are accessible through the new primitive.
This polling was introduced at 51c55b2 to "preserve the semantics of the program w.r.t. exceptions". The implementation was a bit hackish since it needed to poll again right after executing callbacks. This is no longer needed for this purpose now that minor allocations poll in bytecode (since 4.10). Furthermore, a signal arising right before the instruction cannot be distinguished from a signal arising just after. Removing this polling simplifies the code, but also prevents looping when recording a signal from a signal handler (which is very useful to purposefully explore all polling locations of a function, e.g. in testing, as used in subsequent commits).
Remove caml_process_pending_signals_exn, replaced by calls to actions. Then remove signals_are_pending which was used as a cache to make the test inside caml_enter_blocking_section efficient. As a side-effect, this permits to record a signal from a signal handler, very useful to have robust tests. We add such a test as a control for tests in subsequent commits.
It tests a possible future Fun.with_resource similar to Haskell's bracket.
|
@stedolan is this work that you would like to eventually seen integrated in OCaml? (Obviously this PR was against the OCaml 4 runtime and would need to be rewritten on top of of OCaml 5. But before considering doing this, it would be useful to know if the feature has any chance of being integrated at all. This requires work, and who would do it?) |
Asynchronous callbacks (signal handlers, finalisers, memprof callbacks) can raise exceptions “out of thin air”; so-called interrupts (SML). This PR implements the masking of asynchronous callbacks which enforces critical sections in which asynchronous exception are guaranteed not to appear, as suggested in Dolan et al., Concurrent System Programming with Effect Handlers, §4.1.
The commits also have detailed logs; this PR is best read commit-by-commit.
1. Motivation
Currently, OCaml has two incompatible dialects: a dialect with asynchronous exceptions, very popular within a small community of developers and researchers of language tools (see #8873 (comment)), and a dialect with more-or-less reliable resource management (using
unwind-protect-based abstractions when possible), popular within a small community of systems programmers. The dialects are incompatible because asynchronous exceptions make it currently impossible to reason about clean-up of resources, and both the requirements of asynchronous exceptions and resource management permeate whole programs. This work a pre-requisite to any attempt to reconcile the two dialects.The problem is going to be worsened in OCaml multicore, which introduces a primitive notion of resource (fibers) that must be disposed-of explicitly and reliably: failing to do so (as of now) leaks the memory allocated for the fiber, and more importantly any resources "owned" by the fiber (i.e. freed via the exceptional termination of the thread). Asynchronous exceptions were correctly identified as a difficulty in Dolan et al.
The end goal (of which this is only a technical step) is to provide a resource-management and exception-safety model for OCaml which is robust in the face of asynchronous exception, at no added (developer, performance) cost compared to other kinds of unexpected exceptions and serious errors.
In this patch, we provide a way to switch for a short amount of time to a dialect that gives some C-style all-explicit control flow but where one accepts to be more careful about what one executes. The motivation for a dynamic switch is as follows: as a first approximation, one can imagine that this control could be provided by some static annotation on functions that instructs the compiler to remove opportunities for asynchronous callbacks to run. Allocations in particular would be replaced by new variants that do not run callbacks, similarly to what allocations in the C API currently do. In this first approximation, it would be possible to write safe resource-handling functions as desired, but at the cost of not being able to re-use existing code: one would have to write a non-polling variant of functions on lists, implement variants of runtime functions for arrays and system calls with a different polling behaviour, etc. It forces code duplication. To avoid this, we use a run-time switch understood by existing code, instead of static annotations. The role of a run-time switch is therefore to provide a polymorphism of polling behaviour for library code.
2. Overview
This PR includes a new per-domain state
Caml_state->mask_async_callbacks, C functions to switch it (caml_mask,caml_unmask), a higher-order combinator (Sys.mask) in OCaml, and arranges the runtime so that it is easy to explore all polling locations for testing purposes. They are all meant for low-level use; the development of high-level abstractions (e.g. #8962) is outside of the scope. It introduces a distinction between possibly-raising and non-raising signal handlers, to exclude the Systhread tick from being delayed.Optionally, it also includes a fix to a performance bug with C allocations when a signal is pending, and a combinator
Sys.nonpreemptible_mask.3. Design
A function
Sys.maskis added, which guarantees that no asynchronous exceptions arise while it executes its arguments. It does so by delaying the execution of any possibly-raising asynchronous callbacks. Functionscaml_{,un}maskare also provided for the FFI user. The tests demonstrates a toy implementation of an unwind-protect combinator safe of asynchronous exceptions (with_resourceas proposed in #8962) inspired by GHC'sbracket(it is essentiallyFun.protectbut adapted to protect against interrupts, for instance no exception should arise from resource release ever). Sys.mask can also be used on its own to solve issues such as presented in #8794.The terminology
maskcomes from Reppy et al's work on interrupts in SML (https://people.cs.uchicago.edu/~jhr/papers/1990/cornell-tr-1144.pdf) and GHC (https://www.microsoft.com/en-us/research/publication/asynchronous-exceptions-haskell-3/). There is some delta between the paper and actual GHC. GHC has low-level scoped combinatorsmask(for interruptible mask) anduninterruptibleMask, that both provide an additional argumentunmask, and which are then used to define higher-level abstractions such asbracket. We discuss the distinction interruptible/uninterruptible mask, and the unnecessity ofunmaskbelow.Sys.maskneeds sometimes to be used in combination with the polling behaviour of OCaml to provide useful critical sections (cf. the various comments(* BEGIN ATOMIC *)to see this combination in the testsuite). (The clarification of this polling behaviour is outside of the scope of this PR, see "Critical sections" at the end.)We treat the systhread tick (a pseudo-signal which causes a context-switch every 50ms with system threads) as a special case. In principle, resource acquisition and release can call blocking system calls, in which case one wants to switch context (as performed inside
caml_enter_blocking_section). So reacting to the systhread tick, which never interrupts the running thread with an exception, is always valid inside the mask. On the opposite side, I thought of the possible consequences of blocking the systhread tick inside the mask, while necessarily keeping context-switch insidecaml_enter_blocking_section. Programmers might then be tempted to useSys.maskto control the context-switching behaviour to ensure that no asynchronous or concurrent code runs at all (e.g. for races on mutable data). But this would be offering a dangerous interface for this job, given that it gives relevance to whether a function calls or notcaml_enter_blocking_section(transitively), something that should remain an implementation detail permitted to change over time. The solution retained is to never block the systhread tick insideSys.mask, and in addition to offer for discussion an optionalSys.nonpreemptible_maskbetter designed for that purpose (see below).vs. unmask
By design, an
unmaskcombinator as suggested in Dolan et al. is not proposed with this PR. Instead ofunmask, the proof-of-conceptwith_resourceis implemented by combining masking with OCaml's predictable polling behaviour. The latter ought to be strengthened and simplified with an explicit construct to control polling points, currently missing. But I have found that this technique makes reasoning about polling and resource safety easy, whereas there are things that I do not see how to do with an unmask combinator alone easily, like implementing the moving of resources in a way that is async-safe. Things like making a tail call after moving a resource are likely to be impossible with unmask on its own due to its scoped nature.Non-raising handlers
As explained, we treat the systhread tick and blocking sections similarly since they are similar from a context-switching behaviour. Since the signal handler does not raise exceptions, it is allowed to run inside masks. This is implemented using a new notion of non-raising signal handler.
For backwards-compatibility reasons (avoiding to change the definition of
Sys.signal_behavior), the interface to install a non-raising signal handler is not exposed via the Sys module, but remains available via a C primitive.This could be left for explicitly undocumented low-level use, or one could decide on the contrary to have more kinds of non-raising callbacks that are allowed to run inside a mask in the future. (It could be desirable for instance to run finalisers that do simple release inside uninterruptible masks, since it would be realistic to write them carefully so that they do not raise, but an audit of finalisers in the wild has shown that changing this for all finalisers would be a largely-breaking change, so I did not explore this path.)
Non-preemptible mask
I propose for discussion the natural extension of
Sys.maskto a variantSys.nonpreemptible_mask, which disables all callbacks and context-switches for the duration of the scope. It offers the guarantee guarantees that no OCaml code runs asynchronously or concurrently. Accepting or not this part of the PR is matter of including or not the relevant commits.One lesson of OCaml's cooperative concurrency libraries is that some people find it easier to control where context switches may happen, in the context of concurrency for IO. In addition, some OCaml code, for instance core's nano_mutex, take advantage of the absence of context switching and asynchronous code in allocation-free code to program efficient concurrency primitives.
Sys.nonpreemptible_maskextends this capability to code that may allocate. On the one hand, avoiding allocations seems to have been sufficient for this purpose so far. On the other hand, OCaml programmers might have some experience of current limitations to report, or could find some other creative uses.The programmer would rather avoid calling blocking calls inside
Sys.nonpreemptible_mask, and the failure mode can be discussed. For simplicity, this PR unconditionally fails the call tocaml_enter_blocking_sectionwith an exception. While it is much better to unconditionally fail than to silently fail (by doing the context-switch), it is still desirable to keep an implementation detail the fact of whether a function transitively asks to release the runtime lock, and it would be possible to fail with an exception on some blocking calls whilst allowing calls tocaml_enter_blocking_sectionin general without doing the context-switch. Depending on discussions, this would be done in a later PR.vs. Interruptible mask
GHC also implements an "interruptible mask" in which asynchronous exceptions are masked but blocking operations may still be interrupted. The interruptible mask sounds like the appropriate choice for resource acquisition, during which, unlike resource release, exceptions are allowed if they can be expected (strong exception-safety), and one is likely to find blocking operations during acquisition that need to be interruptible.
However, this is more a question of library design for explicitly-interruptible blocking operations. During the work on EINTR-based signals #9722, we have seen that there are not many operations in the standard library that both explicitly poll on EINTR and then restart if no signal handler raises. For many blocking calls, running inside an uninterruptible mask will simply cause an
Unix(EINTR,_,_)exception that the caller must catch, who can then decide to poll explicitly from OCaml. (This will require an explicit polling function that ignores the mask, not yet included.) For the remaining ones it might be appropriate to think of another interface to request explicit polling. This is less a question about the semantics of masking, and I prefer to postpone it to a later discussion (such as #8962) where it will still be possible to add an interruptible mask if it turns out to be needed.(Thanks to @stedolan for an extensive discussion on this point.)
4. Implementation
When an asynchronous callback needs to run,
action_pendinginsignals.cis set, and the runtime is notified viacaml_notify_action. Notifying the program currently means settingyoung_limittoyoung_alloc_endso that the next allocation triggers the gc. To make masking efficient, one does not change how polling is done, but how the program is notified. Only if the newCaml_state->mask_asynchronous_callbacksis below the value ofaction_pendingdoes the runtime notify the program. The program is also notified when unmasking if needed. Various levels of masking are provided, depending on whether they also delay non-raising callbacks.More details are given in the commit logs.
Testing improvements
Testing the correct polling behaviour in OCaml is complicated. Tests based on sending signals from one thread to another are slow, brittle and nondeterministic. A few things are added to simplify testing, some primitives and notably some arrangements are made so that it is possible to re-record a signal from its own signal handler, allowing to explore all possible polling locations in a deterministic way.
Critical sections
The implementation of
Sys.maskshows several critical sections ((* BEGIN/END ATOMIC *)) that cannot currently be enforced in bytecode and flambda. They are there to specify that the masking extends to the receiving and the returning of values, a fact described in the documentation. This is a lesson from implementing thewith_resourcecombinator: Sys.mask works better if it combines well with non-polling sections on the caller side. But to observe this property, one needs to be able to enforce such critical sections in the caller's end in the first place. In backends where such critical sections cannot be enforced, there is nobody to observe the property. Therefore, the specification is vacuously true. Correct uses of Sys.mask in the meanwhile either carefully inline or solve simpler problems (e.g. #8994).young_limit_c
An inefficiency has been noticed in the current polling mechanism in C, leading to the introduction of
Caml_state->young_limit_c. This is mostly orthogonal to this PR. This is needed in this PR only due to the decision to not export action_pending, but the fix stands on its own to solve a problem that is already there. Godbolt has been used to check that the changes to the allocation macro do not affect generated code (more details in the commit log).