Implement quality treatment for asynchronous actions in multicore (1/N)#11095
Implement quality treatment for asynchronous actions in multicore (1/N)#11095sadiqj merged 6 commits intoocaml:trunkfrom
Conversation
b3385d4 to
6dde077
Compare
|
Some tests were failing for probably non-trivial reasons, so I had to cut this PR to the first 4 commits. The rest will need to be included with some of the subsequent commits in a separate PR. |
|
After trying to have only the first 4 commits, there are still tests failing concerning the polling mechanism in the intermediate state. So we will have to consider a larger set of commits after all. I have now included commits up to |
|
The |
c9bbd34 to
de50733
Compare
|
Re I have strengthened the test so that it showed the failure more often. |
de50733 to
ee2e3d6
Compare
|
@sadiqj I have now fixed the bug in this PR and the other one and expect green CI (except for the unneeded change entry). Let's try to get a small set in, you have essentially already reviewed this one. |
| achieves the proper synchronisation. */ | ||
| atomic_exchange(&dom_st->young_limit, (uintnat)dom_st->young_start); | ||
| if (caml_incoming_interrupts_queued() | ||
| || atomic_load_explicit(&dom_st->requested_external_interrupt, |
There was a problem hiding this comment.
I think this needs caml_check_pending_signals. Aside from allowing for asynchronous actions follow-ups, the value from this reorganisation should be that signals don't get delayed if they arrive at an inopportune time.
There was a problem hiding this comment.
If I understand correctly, caml_update_young_limit as implemented here will keep young_limit in interrupt-triggering state if a STW request is still pending, or a "requested external interrupt". On the other hand, signals and explicit GC requests are not checked -- even if some are pending, young_limit will be reset to young_start. I'm not sure I understand the reasoning for behaving differently on these different types of interruptions -- a documentation comment would help.
|
It is up to you. I can add more commits from the full PR. I do not think this PR introduces bugs and I can quickly follow up with a second PR with some of the commits that you wanted we go slower with. |
Can we add the |
ee2e3d6 to
0e75143
Compare
Apologies, I missed this reply. I may be misunderstanding but I'm not sure why the preceding commits are necessary for the Lines 1222 to 1224 in 0e75143 This would avoid the situation we have currently where I think it's possible to race a signal, overwrite young limit and delay its processing until the next GC interrupt. |
|
At this stage this would cause a loop during GC. The preceding commits change the loop, making it possible to progress even if signals are not treated immediately (see their commit log). |
Got it, thanks. The loop in With those two additional commits my understanding is that this PR would get us back to not raising exceptions when we enter the GC from C. Is that correct? |
|
Unfortunately not. This is the commit Delay finalisers and reimplement [caml_process_pending_actions_exn]. I can quickly follow-up with a PR that contains it (along with the signal change). |
|
5e5fa00 looks good too, should we pull that in as well? It'd be nice to tie up the async exception safe work in one PR. |
|
No, this will require earlier commits. I do not think that we are going to change the order of the commits, it works as it is and it is not easy to do. |
0e75143 to
df7958b
Compare
|
Just rebased. |
|
I'm trying to have a look at this PR. (I wanted to see if my recent foraging into runtime code made me more able to review this sort of work; it looks like I was overly optimistic about that). The first two commits are fine but the third commit is already quite technical :-) So, about commit d365a92 :
|
runtime/domain.c
Outdated
| atomic_store_rel(&domain_state->young_limit, | ||
| (uintnat) domain_state->young_start); | ||
| domain_state->young_ptr = domain_state->young_end; | ||
| caml_update_young_limit(); |
There was a problem hiding this comment.
I'm not sure about this change, it is now running some pretty involved logic at a point where we know we cannot be interrupted, and we are just initializing the minor heap state.
There was a problem hiding this comment.
The principle of caml_update_young_limit is that you can call it whenever you want since it always sets the young_limit to its necessary value. Actually you must use it to do any modification to the young_limit, because otherwise you can race with an interrupt and miss it (later on with signals).
This has nothing to do with triggering an interrupt, which would be a bigger problem if there was a risk of it happening in this part of code.
There was a problem hiding this comment.
Well caml_update_young_limit probably assumes something about the state of the rest of the runtime (that various bits have been initialized correctly), and it's not obvious that these assumptions hold in this code that is in the process of setting up the minor GC. For example, in the current runtime code the code path here is called before the following later initialization code:
domain_state->requested_major_slice = 0;
domain_state->requested_minor_gc = 0;
domain_state->requested_external_interrupt = 0;Are we sure that your code can run with those fields uninitialized? (Or possibly reused from a previous domain, etc.)
On the other hand, the code we are discussing allocate_minor_heap can also be called mid-program (when resizing the minor heaps), so the problem of a signal coming at this time should indeed be considered. Still, I think that the more correct approach is to delay this caml_update_young_limit call until the domain is in a well-defined state, and set it more simply (possibly using a dedicated caml_init_young_limit function if you want to keep visibility over all writes, and possibly with NULL, etc., your choice) in the meantime.
There was a problem hiding this comment.
Are we sure that your code can run with those fields uninitialized? (Or possibly reused from a previous domain, etc.)
You probably can (it is always fine to force an allocation failure spuriously). However certainly unintended (and UB) to rely on uninitialised memory. I'll have a look.
There was a problem hiding this comment.
The domain state is allocated with calloc, so memory is initialized in every case. Therefore I keep the call to caml_reset_young_limit to serve the other code paths.
Upon recycling some domain state, the interrupt variables are reinitialized later than the current call to caml_reset_young_limit. This is harmless, however it is cleaner to add an additional call at the end of domain initialization, as in the new commit.
|
Looking at the redefinition of First, "interrupt" can mean one of three distinct things (in a subset relation).
GC interrupts are checked through a single test in the hot path of the mutators, which compares |
gasche
left a comment
There was a problem hiding this comment.
A new round of comments and questions based on the terminology and classification I propose in #11095 (comment).
| atomic_store_rel(s->interrupt_word, (uintnat)(-1)); | ||
| } | ||
|
|
||
| int caml_incoming_interrupts_queued(void) |
There was a problem hiding this comment.
If I understand correctly, this function checks for STW interrupts, but not other service interrupts.
| CAML_EV_BEGIN(EV_INTERRUPT_GC); | ||
| if (atomic_load_acq(young_limit) == INTERRUPT_MAGIC) { | ||
| /* interrupt */ | ||
| if (caml_incoming_interrupts_queued()) { |
There was a problem hiding this comment.
If I understand correctly, there is a change of behavior here. Before, the test would check for any service interrupt, now we only check for STW interrupts. This makes sense because the call to caml_handle_incoming_interrupts within the conditional, if I understand correctly, only handle STW interrupts anyway.
There was a problem hiding this comment.
Right, except there is no change of behaviour for the reason you explain.
| } | ||
|
|
||
| caml_update_young_limit(); | ||
| caml_poll_gc_work(); |
There was a problem hiding this comment.
If the conditional above only handles STW interrupts, then all other forms of interrupts should be handled by caml_poll_gc_work, or will not be handled by this function. (This is a bit weird, and ideally there would be some documentation of which interrupts are intended to be handled where.)
From reading the code of caml_poll_gc_work, my impression is that:
- synchronous minor-gc requests are serviced, as the name suggests
- explicit GC requests from users are serviced
- "requested external interrupts" (tick thread stuff) are serviced
- but signals are not serviced at this point
There was a problem hiding this comment.
Please see if the added documentation on the other PR is good enough
| achieves the proper synchronisation. */ | ||
| atomic_exchange(&dom_st->young_limit, (uintnat)dom_st->young_start); | ||
| if (caml_incoming_interrupts_queued() | ||
| || atomic_load_explicit(&dom_st->requested_external_interrupt, |
There was a problem hiding this comment.
If I understand correctly, caml_update_young_limit as implemented here will keep young_limit in interrupt-triggering state if a STW request is still pending, or a "requested external interrupt". On the other hand, signals and explicit GC requests are not checked -- even if some are pending, young_limit will be reset to young_start. I'm not sure I understand the reasoning for behaving differently on these different types of interruptions -- a documentation comment would help.
runtime/domain.c
Outdated
| CAML_EV_END(EV_INTERRUPT_REMOTE); | ||
| } | ||
|
|
||
| caml_update_young_limit(); |
There was a problem hiding this comment.
I'm not sure about the logic here.
I think that what's going on is that we know we received an interrupt, we want to service it (except signals?), and "silence" the interrupt, but we also want to behave correctly if a new interrupt comes while we are servicing the first one. (At least we don't want to discard the notification for this second interrupt.)
I see broadly two approaches to do this:
- We could discard the interrupt notification right away (by setting
young_limittoyoung_start), before doing anything, and then service all interrupts received at this point. If we decide to not service an interrupt, we probably need to restore the notification (by settingyoung_limitagain to a triggering value). If new interrupts come in the meantime, they will notify themselves to be handled next time. - Or we could service interrupts (without touching young_limit`), and then before leaving discard the interrupt notification unless there are new interrupts queued.
Approach (1) feels more robust to me, but I think both could work in theory.
Before this commit, this function was clearly following approach (1): the INTERRUPT_MAGIC test would catch all service interrupts and the conditional would start by resetting young_limit and then handling the STW interrupts (only); and then the caml_poll_gc_work call would deal with other service interrupts and normal GC work. (Signal interrupts would get dropped, which may be unintended?)
After this commit, the code seems to do a weird mix of (1) and (2):
- first we check for STW interrupts and service them, without resetting
young_limit, and only then do we callcaml_update_young_limit; this sounds like approach (2) for STW interrupts. - but after
caml_update_young_limitwe handle all other gc interrupts, this sounds like approach (1) for non-STW interrupts.
This sounds more complex than necessary. Am I missing something?
There was a problem hiding this comment.
From @gadmm explanation of caml_update_young_limit's intended specification in #11095 (comment), I suspect that there is a third approach which he may be considering: if each sort of interrupt is backed by an update to some other runtime state than the young limit, and young_limit is just an efficiency mechanism (it is meant to be equivalent to checking all the other parts in turn), then young_limit is not itself "the notification" as I had assumed earlier, and its value does not matter as long as it is correctly resynchronized with the rest of the runtime state once we are done dealing with interrupts. So there is no need for a global choice between approaches (1) and (2) hinted above, we can let each handler for a category of interrupt deal with its own runtime state, and then refresh the young limit the handlers are done.
There was a problem hiding this comment.
In fact, reading the whole diff, I can see that another call to update young_limit was added in this PR at the end of caml_poll_gc_work, suggesting that this third way is indeed the general approach.
There was a problem hiding this comment.
Yes, this is what I meant in the commit log of the 4th commit (young_limit does not carry information). I think this is how Jacques-Henri meant caml_update_young_limit.
There was a problem hiding this comment.
Ah, I understood your commit message as saying that there is no need to check against a specific INTERRUPT_MAGIC value, instead of any value above the next young pointer.
|
As far as PR #11057 is concerned, the meaningful difference is between delayable actions that may run OCaml code, and non-delayable actions which may not run OCaml code (documented here: 9d83eda#diff-93ee9a0b40685be35da1909f57cba82cfe51d5a8de495947e5068a8b8d8172cbR247-R277). I agree that some remaining terminology in the code would benefit from being clarified. Let's do this afterwards? |
|
Re |
|
The 4.14 code is architected very differently, but Lines 531 to 534 in bfb4b1e which I think corresponds to a On the other hand, continuing down this line of thinking: Multicore uses |
I do not think there is such a meaningful change intended. amd64.S still uses On the other hand your comment reminds me of the off-by-one bug I mentioned at #11095 (comment). If |
|
So if I am following this right, we both agree that there is a small bug here, but it's not in your code, so of course it's up to you to touch it or not, in this PR or another. |
|
I do not think it is a bug ( |
|
I re-read the whole diff, instead of per-commit, and I think I now understand what's going on and agree that the proposed changes are correct. But I'm still nervous about this call to |
|
Re: strategy. It is difficult to structure a PR for such a large change. There is not much more to explain about what happens in each commit in the commit logs (they really do not do much each); what is missing from the commit logs is perhaps the overall picture (for this I have some documentation in code in later commits). This is made more complicated by the fact that the reference for this change is the pre-multicore code. It is a bit as if you were reviewing my rebasing of the old code on top of multicore. I recommend getting sense of the other PR first, with this point in mind. This might be easier for people familiar with the pre-multicore code. I do think, though, that the commits are individually sound-enough that a reviewer can make sure that no bug or regression is introduced. This is why I think that this prefix PR, though not necessary, can be reviewed independently on a correctness basis once you are convinced that it is good to get back closer to the pre-multicore mechanism. In particular for design comments on individual commits, I recommend cross-checking with the final state of the PR to see if a later commit does not resolve the issue. |
| CAML_EV_END(EV_INTERRUPT_REMOTE); | ||
| } | ||
|
|
||
| caml_reset_young_limit(Caml_state); |
There was a problem hiding this comment.
Reading the whole diff, it's subtle that there is a call to caml_reset_young_limit here and also one at the end of caml_poll_gc_work called just below. Naively I wondered if we could keep just the reset at the end.
Here is my understanding of what is going on:
- It would be wrong to just remove the reset here, because it would make
caml_poll_gc_workempty the minor heap, even in the case where the limit was triggered only due to a STW interrupt and no GC work is in fact necessary. - But we could remove the reset if there was a call to
caml_reset_young_limitat the end ofcaml_handle_incoming_interrupts. From a distance it sounds like that would be clearer than the current code. @gadmm, why did you add the reset here instead?
There was a problem hiding this comment.
It is there because the next line can still run OCaml code (finalisers) at this point. It is removed entirely in a later commit.
There was a problem hiding this comment.
Okay. I'm still not sure why it's not in caml_handle_incoming_interrupts instead, but it doesn't really matter for transitory code, it's correct either way.
For signals it comes in a later PR. For requested minor & major, this comes from the fact that this is indeed only requested synchronously, and caml_reset/update_young_limit is only called after they had an occasion of being serviced (AFAIR). However this is more restrictive than how I advertised caml_reset_young_limit—that you can call it anytime. Would you like me to add a check for requested minor & major there? |
I think that it would be a good idea if there is no performance impact. It makes the whole thing easier to reason about and specify (young_limit is the conjunction of young_start and all interruption sources). But if you deal with signals in a later PR, it is of course fine to deal with GC requests in another PR as well. (Or: we could implement synchronous user GC requests with another mechanism that does not try to pass as an asynchronous interrupt, for example by calling an inner function directly. That alternative approach would also make the whole thing more consistent and easier to specify.) |
I am not sure what you mean. It is a request that is meant to be delayed until the subsequent safe point. |
Sorry, that was wrong indeed. I thought (wrongly) that |
|
(At this point I'm again in the state where my question on initialization is the only remaining one, that I know of, before approval.) |
The output has been disassembled (gcc -O2, arch=x86-64) to confirm that `CAMLunlikely` works as intended and produces equivalent code as before.
This function ensures that young_limit is always reset to its desired value, without risk of race. Also avoid relying on an information-carrying value for young_limit.
Aim to take all requests into account to simplify reasoning Reset young_limit after reinitialisation of interrupt variables
at least to me (the absence of race in the assertion depended on who calls it)
751a6ae to
80b8432
Compare
|
Thanks, I have cleaned-up the history. I'll go with a no-change-entry-needed tag for this one, since I cannot think of a discernible change in behaviour it introduces. |
|
Thanks for your perseverance with this @gadmm ! |
|
Thanks ! |
|
I think that you have inadvertently squashed the history. |
Oh dear, I did. Sorry. I also noticed the tests had failed on trunk. I don't think that's related though (it seems to be the intermittent weak lifetime parallel failure. Odd that it happened on two of the testsuites though..) |
Here is a first batch of commits from #11057 following @sadiqj's review.
No change entry needed.