Guarantee that allocation functions do no trigger callbacks when called from C#8691
Guarantee that allocation functions do no trigger callbacks when called from C#8691stedolan merged 3 commits intoocaml:trunkfrom
Conversation
5e2707c to
f542988
Compare
5cea13a to
6b61700
Compare
|
Generally I like the new implementation; I looked at it now and I haven't found anything to nitpick about. It's a lot of code to review and I haven't looked at everything in details. Here is one way to think about postponing that answers some of my interrogations about purging the postponed queue:
With this point of view, if one of the callbacks decides to disable sampling, then purging the rest of the postponed queue makes sense: those allocations have morally not happened yet, they will happen after sampling is disabled. On the other hand, if one of the callbacks decides to re-enable sampling (maybe with a different frequency and callback), then we have lost some valuable information: the allocations between that callback and the return to the OCaml world will not be observed. There is no good way to deal with changing the frequency parameter in the postponed stack: the only correct way to deal with this would be to keep track of all the allocations that happen to be able to re-sample them, but this is not practical nor in fact possible (if the minor heap gets collected in the meantime). I'm sort of curious to understand how the Here would be a possible recommendation:
Note: |
|
I agree this is unfortunate that calling
Actually, this could more or less be implemented, provided we know whether or not we are running a callback, and provided we have a copy of the
This means that we could actually implement a wrapper module implementing everything |
|
Yes, an earlier version of my comment suggested to just let the user check if sampling is currently suspended, and I think that would also be fine. I agree that this is all independent from the purpose of the current PR. (The relation is that now we use the postponing mechanism more.) |
stedolan
left a comment
There was a problem hiding this comment.
This looks great! Knowing that caml_alloc will never be interrupted by user code is a significant simplification of the C API. (I think this probably goes most of the way towards solving #7503). I have the usual pile of comments, but nothing serious.
I'm wondering a bit about the postponed queue resizing logic. As far as I can see, we should never have more than a few postponed samples outstanding, unless we have a long-running loop in C that never calls caml_check_urgent_gc or similar. If we do have such a loop, is growing the postponed queue without bound the right thing to do? It might be better to use a fixed-size buffer and just drop old samples if it overflows. (I'm not sure either way: I don't think there's any terribly nice thing to do in this situation).
There are now three types of asynchronous callback into OCaml code: signal handlers, finalisers and memprof callbacks. As far as I can tell, in any situation where it's OK to run one of these three it's OK to run the others, so I think anywhere we run one but not the others is probably a bug. In particular, I think caml_check_urgent_gc will check for postponed memprof callbacks and pending finalisers, but not for pending signals.
runtime/minor_gc.c
Outdated
| if(!(flags & CAML_FROM_CAML) && old_final_enabled) { | ||
| caml_final_enabled = 1; | ||
| caml_final_check_to_do(); | ||
| } |
There was a problem hiding this comment.
The presence of this dance to ensure that caml_gc_dispatch does not invoke finalisers suggests that caml_gc_dispatch has the wrong API. Perhaps caml_gc_dispatch should never invoke finalisers, and there should be a separate function that calls caml_final_do_calls, caml_memprof_handle_postponed and caml_process_pending_signals?
Also, caml_final_check_to_do looks like very similar logic to caml_update_young_limit, should these be the same function?
There was a problem hiding this comment.
Also, caml_final_check_to_do looks like very similar logic to caml_update_young_limit, should these be the same function?
Hum, they seem really different to me. Perhaps you got confused with function names?
There was a problem hiding this comment.
No, I really meant it! Both of these functions check some global state, and if the global state indicates that async callbacks need to be run, they set caml_young_limit to caml_young_alloc_end (in native code mode). Of the three types of async callbacks (signals, memprof and finalisers), it's odd to me that the status of two of them is checked in caml_update_young_limit while the status of the third is checked here.
This API means that calling caml_update_young_limit is insufficient to correctly update the young limit: if finalisers may be pending, I must call caml_final_check_to_do as well.
I am planning to reuse this queue to simplify much the code for sampling in unmarshalled data. In this case, we can really have a large number of samples accumulating, and I really think this is the right thing to do. That said, I understand your worries. But I don't think there is anything good to do in this case. Sure, there might be some memory leaks if the C code is never giving back the control to OCaml code while still allocating/deallocating a lot of memory. But do we really expect any C binding to do that?
Indeed. I tried to factorize all these calls in one function, I also changed the code of |
|
BTW, the fact that the major GC is calling hooks that are allowed to allocate and execute arbitrary OCaml code (see comment in Would it be possible to add the restriction that these functions should not call an OCaml callback? Where are these functions used, in practice? Here is the list: |
Right. I agree that there's nothing good to do in this case, and I don't expect C bindings to act like this. But the code for handling buffer overflow only exists to handle this case, so I thought it might be simpler to remove it and drop samples instead (which is arguably preferable to allocating without bound here). However, I hadn't thought of the unmarshalling case. I agree that that's a case where many samples might accumulate, and you really do want to run the callbacks. (At some point I'd like to make marshalling interruptible, but that's not happening soon) |
I didn't know about these functions either! I think it would be possible to add this restriction. I can only find two uses of this feature: one in Since this is nominally a breaking change, I think there should be a seperate PR which just edits the comment in |
|
Now that #8711's been merged, does this mean that the new loop in |
545fb71 to
fce0747
Compare
…n OCaml heap from C code. The finalizers and all the other asynchronous callbacks (including signal handlers, memprof callbacks and finalizers) are now called in a common function, [caml_async_callbacks]. It is called in [caml_check_urgent_gc] and wherever [caml_process_pending_signals] was called. This makes it possible to simplify the [caml_gc_dispatch] logic by removing the loop it contains, since it no longer calls finalizers.
This makes sure that: - Callbacks are never called when another is running - The postponed queue is purged when setting memprof parameters We now use a FIFO implemented as a circular buffer for remembering of postponed blocks.
fce0747 to
0ca84f5
Compare
Indeed. I removed the loop, squashed the commits and made a bit of cleaning. |
stedolan
left a comment
There was a problem hiding this comment.
New version looks good to me!
|
Wait a bit, I am still planning a small change. |
|
See the latest commit. I think it answer you concern about |
|
Minor nitpick: It's not clear to me why you moved |
They are not signal-specific, but don't have to do much with the minor GC as well... My justification for moving them to |
…er that callbacks are pending. This make us able to get rid of to xxx_to_do variables in `final.c` and `memprof.c`. The variable is reset to 0 when entering `caml_check_urgent_gc`, which is now the main entry point for asynchronous callbacks. In case a callback raises an exception, we need to set it back to 1 to make sure no callback is missed.
9ef2889 to
42ab59a
Compare
You have a point that finalisers have polled signals themselves. However, this does not make it a polling location, this only reassures that making it a polling point does not provoke a huge breakage. This still changes the behaviour from "once in a blue moon (major gc cycle), we run the risk that something unexpected can happen" to "we are prepared that every time, something unexpected can happen". What you describe is such a rare and unreproducible combination of circumstances, for a kind of bad effect (leak) that can be so hard to notice or trace back to a cause, that I am not entirely convinced that wrong uses of check_urgent_gc have all been reported. I prefer to see it this way: since signals could only have been polled via finalisers, then we know that no code in the wild relies on signals being polled in check_urgent_gc. Same with memprof callbacks which didn't exist. If we turn it into a polling point now, then we are going to lose this information. So since the general theme of the PR is to provide a better control on when asynchronous callbacks run, then let's add this guarantee to functions that call check_urgent_gc internally (including those in the wild). If there is any issue with the call to finalisers being after all important, then I would have a bigger worry that this also endangers the premiss of this PR: that it it safe to remove calls to finalisers from allocation functions in C.
Does #8897 count? From what I see, check_urgent_gc is used in the runtime to perform gc-related operations, such as just after creating many major-to-minor pointers. That's also true of lablgtk where it has to do IIRC with making sure that a minor collection has been done. The name also does not really refers to asynchronous callbacks (as objected by @dbuenzli in the other thread). Only with ephemerons I would not exclude that has to do with running finalisers, but if that is the case then we can always replace it with a call to process_pending_events, and then we will have to audit it to make sure that it is robust for asynchronous exceptions across all callers (and for future-proofness, have it return the exception instead of raising it immediately).
I am not sure if I understand correctly: this is a change unrelated to the topic of the PR, and it has not been described nor discussed in this review, right? I like the direction, this is a prerequisite for bringing polling in bytecode in line with polling in native after the Better Safe Points change—though I cannot comment on whether this is safe for the interpreter (and you have your point that signal handlers already run inside finalisers). But this is not a minor change, and it seems to me that it would have been easy to include it in a separate commit, and clearly describe how and why the behaviour of interpreted programs is changed, so that people know that the behaviour is changing. Also, is it mentioned in the Changes file? |
Alright, I did a search on GitHub, and it appears indeed that the uses of these functions in the wild are rare (I found that only in LablGTK and OCamlnet), and those uses are only in allocation related functions, and not really for the purpose of polling e.g., signals. That said, it seems like in both cases it is safe to run async callbacks from So, I agree with you that, for the sake of uniformity with other allocation functions in the OCaml runtime, we could disable async callback polling in Perhaps the simplest solution is that you simply change the default behavior of
You are right that this has not been discussed much, and that this change has not been documented well. I now remember better the reason: even though the design has been chosen to make #8805 easy, there is a more fundamental reason which is specific to this patch: when sampling and allocation, we want, if possible, to call the Memprof callback as soon as possible. One very common case is when allocating in the minor heap from OCaml code, and, in this case, we can arrange to call the callback immediately during the allocation. Now, if we allow running a Memprof callback in this context, then there is no reason we don't allow any callback.
Do you really think one can easily see the difference? Remember that we do polling in bytecode mode at any function call and loop iteration! Also, do you think that nowadays many people write native-incompatible OCaml code? I can add a |
I was thinking about that. I will take care of it.
Relatedly, a very concrete question I have (see #8961 and 2b1bdd4) is whether you see a use in being able to raise from a memprof callback, and if not, whether it would be preferable to completely forbid any exception arising from them. The reason is that there is almost certainly going to be a distinction between raising signal handlers and non-raising ones, in order to be able to yield() properly inside resource acquisition and release functions (I will have to ask the language designers if they have other solutions in mind, but I am quite confident that his is the way to go–please go to #8961 if you wish to contribute to the discussion). The answer is going to determine whether memprof callbacks can run inside acquisition and release functions: if we forbid exceptions arising from memprof callbacks, then we can call them during (and therefore profile) acquisition and release of resources (cf. the
Another possible answer would be that it is not that bad to delay memprof callbacks, and that we do not care that much (including about callbacks delayed by the thread yielding).
It probably deserves an entry in the Changes file because is it a separate change, not sure about the '*' though: I agree that if correct, then the change is probably unnoticeable for the user. I expressed myself poorly, I was saying that people who develop OCaml can be interested in keeping track of the semantics of the language. For instance, people involved in BSP can be interested in knowing that there is a better opportunity now to have the same polling behaviour in native and bytecode. (I am so interested in that because it is important for the implementation of resource-management primitives: at the moment I cannot reason about polling points in bytecode, so the implementation is at best brittle, and at worst broken without me knowing and without solutions to fix it.) |
That's something we need to discuss a little more, and I am afraid this is not going to be ready for 4.10. My gut feeling for now is that I don't think exceptions are useful in memprof callbacks, but they may raise OOM/stack overflow exceptions that the user may want to handle... Anyway, Memprof is unlikely to be ready for 4.10, and we will probably do large changes to its API (see #8920) before its release. So, as discussed with @gasche, what we will probably do is simply remove the |
See #9007 |
We were really rather hoping to get Memprof into 4.10. I don't think that a decision about what the behaviour should be when a memprof callback raises an exception is needed in the first version of the feature. We can easily change such behaviour later and anyone whose code was relying on the previous behaviour deserves to have their code break. |
|
There is a lot more work to be done to upstream the entirety of memprof than this specific discussion. Even if we assume that @stedolan will magically do all of the review and design API discussions, 4.10 sounds like a very tight deadline (and also fairly demanding for Jacques-Henri's volunteer time). It does sound more reasonable to me to (do our best to have statmemprof advance at a steady pace, but) not try to force it into 4.10. |
Sure, if its not ready in time then its not ready. I just wanted to make sure that it didn't get delayed by a fairly hypothetical discussion about exceptions coming out of the callbacks. |
|
No problem. There seems to be an agreement that the user should not rely on exceptions coming out of memprof callbacks, but that we have more time on our hands to discuss the implications. |
|
I am still trying to understand what is the semantics for calling
With the current PR, Would it not be more consistent to always call other callbacks alongside memprof callbacks? Also it would be more consistent with native to always call I am not saying that there is a problem with the current code, I am trying to follow your motivations for the change. I noticed a leftover |
You're right. I think there is a bug in
Well spotted. Fix incoming in the PR. |
In ocaml#8691, caml_check_urgent_gc was merged with the function that runs asynchronous callbacks. The rationale was that caml_check_urgent_gc already runs finalisers, and so could have run any asynchronous callbacks. We agreed on a different strategy: we know that users could not rely on asynchronous callbacks being called at this point, so take the opportunity to make it callback-safe, like was done for allocation functions. The new check_urgent_gc no longer calls finalisers (nor any callbacks), and instead two functions are introduced: * caml_process_pending_callbacks : internal function to execute callbacks unconditionally. * caml_process_pending_actions : public-facing function that checks for something to do, and executes all actions (GC and callbacks). Archaeology shows that this matches the behaviour from before the work started (modulo fewer unpredictable calls to finalisers). In particular, some calls to caml_check_current_gc are removed because archaeology showed that it was introduced to replace a direct call to callbacks.
In ocaml#8691, caml_check_urgent_gc was merged with the function that runs asynchronous callbacks. The rationale was that caml_check_urgent_gc already runs finalisers, and so could have run any asynchronous callbacks. We agreed on a different strategy: we know that users could not rely on asynchronous callbacks being called at this point, so take the opportunity to make it callback-safe, like was done for allocation functions. The new check_urgent_gc no longer calls finalisers (nor any callbacks), and instead two functions are introduced: * caml_process_pending_callbacks : internal function to execute callbacks unconditionally. * caml_process_pending_actions : public-facing function that checks for something to do, and executes all actions (GC and callbacks). Archaeology shows that this matches the behaviour from before the work started (modulo fewer unpredictable calls to finalisers). In particular, some calls to caml_check_current_gc are removed because archaeology showed that it was introduced to replace a direct call to callbacks.
In ocaml#8691, caml_check_urgent_gc was merged with the function that runs asynchronous callbacks. The rationale was that caml_check_urgent_gc already runs finalisers, and so could have run any asynchronous callbacks. We agreed on a different strategy: we know that users could not rely on asynchronous callbacks being called at this point, so take the opportunity to make it callback-safe, like was done for allocation functions. The new check_urgent_gc no longer calls finalisers (nor any callbacks), and instead two functions are introduced: * caml_process_pending_callbacks : internal function to execute callbacks unconditionally. * caml_process_pending_actions : public-facing function that checks for something to do, and executes all actions (GC and callbacks). Archaeology shows that this matches the behaviour from before the work started (modulo fewer unpredictable calls to finalisers). In particular, some calls to caml_check_current_gc are removed because archaeology showed that it was introduced to replace a direct call to callbacks.
In ocaml#8691, caml_check_urgent_gc was merged with the function that runs asynchronous callbacks. The rationale was that caml_check_urgent_gc already runs finalisers, and so could have run any asynchronous callbacks. We agreed on a different strategy: we know that users could not rely on asynchronous callbacks being called at this point, so take the opportunity to make it callback-safe, like was done for allocation functions. The new check_urgent_gc no longer calls finalisers (nor any callbacks), and instead two functions are introduced: * caml_process_pending_callbacks : internal function to execute callbacks unconditionally. * caml_process_pending_actions : public-facing function that checks for something to do, and executes all actions (GC and callbacks). Archaeology shows that this matches the behaviour from before the work started (modulo fewer unpredictable calls to finalisers). In particular, some calls to caml_check_current_gc are removed because archaeology showed that it was introduced to replace a direct call to callbacks.
In ocaml#8691, caml_check_urgent_gc was merged with the function that runs asynchronous callbacks. The rationale was that caml_check_urgent_gc already runs finalisers, and so could have run any asynchronous callbacks. We agreed on a different strategy: we know that users could not rely on asynchronous callbacks being called at this point, so take the opportunity to make it callback-safe, like was done for allocation functions. The new check_urgent_gc no longer calls finalisers (nor any callbacks), and instead two functions are introduced: * caml_process_pending_callbacks : internal function to execute callbacks unconditionally. * caml_process_pending_actions : public-facing function that checks for something to do, and executes all actions (GC and callbacks). Archaeology shows that this matches the behaviour from before the work started (modulo fewer unpredictable calls to finalisers). In particular, some calls to caml_check_current_gc are removed because archaeology showed that it was introduced to replace a direct call to callbacks.
In ocaml#8691, caml_check_urgent_gc was merged with the function that runs asynchronous callbacks. The rationale was that caml_check_urgent_gc already runs finalisers, and so could have run any asynchronous callbacks. We agreed on a different strategy: we know that users could not rely on asynchronous callbacks being called at this point, so take the opportunity to make it callback-safe, like was done for allocation functions. The new check_urgent_gc no longer calls finalisers (nor any callbacks), and instead two internal functions are introduced: * caml_process_pending_callbacks : function to execute callbacks unconditionally. * caml_check_urgent_gc_and_callbacks : function that checks for something to do, and then executes all actions (GC and callbacks).
This PR contains two commits:
1- A patch discussed in #8674, which guarantee that allocation functions do no trigger callbacks when called from C. This fixes the unmarshalling bug discussed there, and probably others too, since the C programmer usually doesn't take care about this when calling allocation functions.
2- The commit in 1-required to postpone more samplings in memprof, and uncovered a memprof bug. I therefore refactored the machinery for handling postponned blocks in memprof.