Skip to content

Guarantee that allocation functions do no trigger callbacks when called from C#8691

Merged
stedolan merged 3 commits intoocaml:trunkfrom
jhjourdan:no_callback_c_alloc
Jun 11, 2019
Merged

Guarantee that allocation functions do no trigger callbacks when called from C#8691
stedolan merged 3 commits intoocaml:trunkfrom
jhjourdan:no_callback_c_alloc

Conversation

@jhjourdan
Copy link
Copy Markdown
Contributor

This PR contains two commits:

1- A patch discussed in #8674, which guarantee that allocation functions do no trigger callbacks when called from C. This fixes the unmarshalling bug discussed there, and probably others too, since the C programmer usually doesn't take care about this when calling allocation functions.

2- The commit in 1-required to postpone more samplings in memprof, and uncovered a memprof bug. I therefore refactored the machinery for handling postponned blocks in memprof.

@jhjourdan jhjourdan force-pushed the no_callback_c_alloc branch 2 times, most recently from 5cea13a to 6b61700 Compare May 28, 2019 09:53
@gasche
Copy link
Copy Markdown
Member

gasche commented May 28, 2019

Generally I like the new implementation; I looked at it now and I haven't found anything to nitpick about. It's a lot of code to review and I haven't looked at everything in details.

Here is one way to think about postponing that answers some of my interrogations about purging the postponed queue:

  • One way to think about the system is to imagine that what happens in the C runtime is not observable to the user; they can only see what happens when OCaml code is executed.
  • The purpose of statmemprof is to make (some) allocations observable to the user (by executing some OCaml code when they happen).
  • If a piece of C code allocates, we can imagine that the allocations are not happening yet, or at least that they have not been observed yet. They are observed just before resuming normal execution, by traversing the postponed queue and calling the callback on them -- morally, this is when the allocation happens.

With this point of view, if one of the callbacks decides to disable sampling, then purging the rest of the postponed queue makes sense: those allocations have morally not happened yet, they will happen after sampling is disabled.

On the other hand, if one of the callbacks decides to re-enable sampling (maybe with a different frequency and callback), then we have lost some valuable information: the allocations between that callback and the return to the OCaml world will not be observed.

There is no good way to deal with changing the frequency parameter in the postponed stack: the only correct way to deal with this would be to keep track of all the allocations that happen to be able to re-sample them, but this is not practical nor in fact possible (if the minor heap gets collected in the meantime).

I'm sort of curious to understand how the Gc.Memprof.set_ctrl function will be used. If the only use-case is to stop profiling and never enable it again, the current implementation is making the right choice. If we start and stop profiling based on user-input events (sending a special request to the server, etc.), then the current behavior is also fine. But if an user were to try to temporarily suspend profiling in a critical section, then the purging logic is wrong. This is ironic given that sampling is already suspended during callback execution, so there is in fact no reason for users to temporarily disable sampling at this point.

Here would be a possible recommendation:

  • Document to users that Gc.Memprof.stop can loose postponed allocation events.
  • Introduce an interface in Gc.Memprof to temporarily suspend sampling, for example Gc.Memprof.without_sampling : (unit -> 'a) -> 'a -- that would be a no-op when sampling is already suspended.

Note: without_sampling cannot be implemented by end-users without losing event, but more importantly it cannot be implemented at all given that there is no way to get the current exists 'a. 'a ctrl value.

@jhjourdan
Copy link
Copy Markdown
Contributor Author

I agree this is unfortunate that calling Gc.Memprof.set_ctrl while running a callback can lead to sample loss. Note, however, that no sample will be loss if this function is called outside of a callback. I added some documentation about that.

Introduce an interface in Gc.Memprof to temporarily suspend sampling, for example Gc.Memprof.without_sampling : (unit -> 'a) -> 'a -- that would be a no-op when sampling is already suspended.

Note: without_sampling cannot be implemented by end-users without losing event, but more importantly it cannot be implemented at all given that there is no way to get the current exists 'a. 'a ctrl value.

Actually, this could more or less be implemented, provided we know whether or not we are running a callback, and provided we have a copy of the ctrl value:

  • If so, then there is nothing to do
  • If not, then run stop/start. This is guaranteed not to loose any sample.

This means that we could actually implement a wrapper module implementing everything Gc.Memprof implements, plus Gc.Memprof.without_sampling. I agree, however, that this is not ideal. This is completely independent from the current PR, though, so let's wait and see whether this is actually useful.

@gasche
Copy link
Copy Markdown
Member

gasche commented May 28, 2019

Yes, an earlier version of my comment suggested to just let the user check if sampling is currently suspended, and I think that would also be fine. I agree that this is all independent from the purpose of the current PR. (The relation is that now we use the postponing mechanism more.)

Copy link
Copy Markdown
Contributor

@stedolan stedolan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Knowing that caml_alloc will never be interrupted by user code is a significant simplification of the C API. (I think this probably goes most of the way towards solving #7503). I have the usual pile of comments, but nothing serious.

I'm wondering a bit about the postponed queue resizing logic. As far as I can see, we should never have more than a few postponed samples outstanding, unless we have a long-running loop in C that never calls caml_check_urgent_gc or similar. If we do have such a loop, is growing the postponed queue without bound the right thing to do? It might be better to use a fixed-size buffer and just drop old samples if it overflows. (I'm not sure either way: I don't think there's any terribly nice thing to do in this situation).

There are now three types of asynchronous callback into OCaml code: signal handlers, finalisers and memprof callbacks. As far as I can tell, in any situation where it's OK to run one of these three it's OK to run the others, so I think anywhere we run one but not the others is probably a bug. In particular, I think caml_check_urgent_gc will check for postponed memprof callbacks and pending finalisers, but not for pending signals.

if(!(flags & CAML_FROM_CAML) && old_final_enabled) {
caml_final_enabled = 1;
caml_final_check_to_do();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The presence of this dance to ensure that caml_gc_dispatch does not invoke finalisers suggests that caml_gc_dispatch has the wrong API. Perhaps caml_gc_dispatch should never invoke finalisers, and there should be a separate function that calls caml_final_do_calls, caml_memprof_handle_postponed and caml_process_pending_signals?

Also, caml_final_check_to_do looks like very similar logic to caml_update_young_limit, should these be the same function?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, caml_final_check_to_do looks like very similar logic to caml_update_young_limit, should these be the same function?

Hum, they seem really different to me. Perhaps you got confused with function names?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I really meant it! Both of these functions check some global state, and if the global state indicates that async callbacks need to be run, they set caml_young_limit to caml_young_alloc_end (in native code mode). Of the three types of async callbacks (signals, memprof and finalisers), it's odd to me that the status of two of them is checked in caml_update_young_limit while the status of the third is checked here.

This API means that calling caml_update_young_limit is insufficient to correctly update the young limit: if finalisers may be pending, I must call caml_final_check_to_do as well.

@jhjourdan
Copy link
Copy Markdown
Contributor Author

I'm wondering a bit about the postponed queue resizing logic. As far as I can see, we should never have more than a few postponed samples outstanding, unless we have a long-running loop in C that never calls caml_check_urgent_gc or similar. If we do have such a loop, is growing the postponed queue without bound the right thing to do? It might be better to use a fixed-size buffer and just drop old samples if it overflows. (I'm not sure either way: I don't think there's any terribly nice thing to do in this situation).

I am planning to reuse this queue to simplify much the code for sampling in unmarshalled data. In this case, we can really have a large number of samples accumulating, and I really think this is the right thing to do.

That said, I understand your worries. But I don't think there is anything good to do in this case. Sure, there might be some memory leaks if the C code is never giving back the control to OCaml code while still allocating/deallocating a lot of memory. But do we really expect any C binding to do that?

There are now three types of asynchronous callback into OCaml code: signal handlers, finalisers and memprof callbacks. As far as I can tell, in any situation where it's OK to run one of these three it's OK to run the others, so I think anywhere we run one but not the others is probably a bug. In particular, I think caml_check_urgent_gc will check for postponed memprof callbacks and pending finalisers, but not for pending signals.

Indeed. I tried to factorize all these calls in one function, caml_async_callbacks. There are still a few calls to the individual functions here and there, but there are more "internal calls", used for e.g., making sure that the queue is empty before continuing.

I also changed the code of caml_gc_dispatch to prevent it from calling callbacks. I think I also uncovered a bug there: when running a major slice, the major GC might call a C hook, which is allowed to allocate and fill up the minor heap. This breaks the guarantee that caml_gc_dispatch leaves enough room for at least one object. That said, I am not really confident in what I did there, so I would be happy if @damiendoligez could have a look at least to changes in minor_gc.c.

@jhjourdan
Copy link
Copy Markdown
Contributor Author

BTW, the fact that the major GC is calling hooks that are allowed to allocate and execute arbitrary OCaml code (see comment in misc.h) completely kills the point of this PR. I did not know about them!

Would it be possible to add the restriction that these functions should not call an OCaml callback? Where are these functions used, in practice?

Here is the list:

CAMLextern void (*caml_major_gc_hook)(void);

typedef void (*caml_timing_hook) (void);
extern caml_timing_hook caml_major_slice_begin_hook, caml_major_slice_end_hook;
extern caml_timing_hook caml_minor_gc_begin_hook, caml_minor_gc_end_hook;
extern caml_timing_hook caml_finalise_begin_hook, caml_finalise_end_hook;

@stedolan
Copy link
Copy Markdown
Contributor

stedolan commented Jun 3, 2019

I am planning to reuse this queue to simplify much the code for sampling in unmarshalled data. In this case, we can really have a large number of samples accumulating, and I really think this is the right thing to do.

That said, I understand your worries. But I don't think there is anything good to do in this case. Sure, there might be some memory leaks if the C code is never giving back the control to OCaml code while still allocating/deallocating a lot of memory. But do we really expect any C binding to do that?

Right. I agree that there's nothing good to do in this case, and I don't expect C bindings to act like this. But the code for handling buffer overflow only exists to handle this case, so I thought it might be simpler to remove it and drop samples instead (which is arguably preferable to allocating without bound here).

However, I hadn't thought of the unmarshalling case. I agree that that's a case where many samples might accumulate, and you really do want to run the callbacks. (At some point I'd like to make marshalling interruptible, but that's not happening soon)

@stedolan
Copy link
Copy Markdown
Contributor

stedolan commented Jun 3, 2019

BTW, the fact that the major GC is calling hooks that are allowed to allocate and execute arbitrary OCaml code (see comment in misc.h) completely kills the point of this PR. I did not know about them!

Would it be possible to add the restriction that these functions should not call an OCaml callback? Where are these functions used, in practice?

I didn't know about these functions either!

I think it would be possible to add this restriction. I can only find two uses of this feature: one in flow and one in Jane Street's internal code. Neither allocates, performs callbacks, or indeed calls any OCaml runtime functions at all.

Since this is nominally a breaking change, I think there should be a seperate PR which just edits the comment in misc.h.

@stedolan
Copy link
Copy Markdown
Contributor

stedolan commented Jun 5, 2019

Now that #8711's been merged, does this mean that the new loop in caml_gc_dispatch is unnecessary?

@jhjourdan jhjourdan force-pushed the no_callback_c_alloc branch 2 times, most recently from 545fb71 to fce0747 Compare June 5, 2019 12:23
jhjourdan added 2 commits June 5, 2019 14:25
…n OCaml heap from C code.

The finalizers and all the other asynchronous callbacks (including
signal handlers, memprof callbacks and finalizers) are now called in a
common function, [caml_async_callbacks]. It is called in
[caml_check_urgent_gc] and wherever [caml_process_pending_signals] was
called.

This makes it possible to simplify the [caml_gc_dispatch] logic by
removing the loop it contains, since it no longer calls finalizers.
This makes sure that:
- Callbacks are never called when another is running
- The postponed queue is purged when setting memprof parameters

We now use a FIFO implemented as a circular buffer for remembering of
postponed blocks.
@jhjourdan jhjourdan force-pushed the no_callback_c_alloc branch from fce0747 to 0ca84f5 Compare June 5, 2019 12:25
@jhjourdan
Copy link
Copy Markdown
Contributor Author

Now that #8711's been merged, does this mean that the new loop in caml_gc_dispatch is unnecessary?

Indeed. I removed the loop, squashed the commits and made a bit of cleaning.

Copy link
Copy Markdown
Contributor

@stedolan stedolan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New version looks good to me!

@jhjourdan
Copy link
Copy Markdown
Contributor Author

Wait a bit, I am still planning a small change.

@jhjourdan
Copy link
Copy Markdown
Contributor Author

See the latest commit. I think it answer you concern about caml_final_check_to_do, and simplify the code a bit.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jun 5, 2019

Minor nitpick: It's not clear to me why you moved caml_check_urgent_gc in signals.c, and why caml_set_something_to_do and caml_raise_in_async_callback are defined there. I would define those functions in minor_gc.c (like caml_check_urgent_gc is before the PR), as they are not signal-specific.

@jhjourdan
Copy link
Copy Markdown
Contributor Author

I would define those functions in minor_gc.c (like caml_check_urgent_gc is before the PR), as they are not signal-specific.

They are not signal-specific, but don't have to do much with the minor GC as well... My justification for moving them to signal.c was that the logic that handles asynchronous callbacks is in signal.c, since signals are (one of) the main use of this mechanism.

…er that callbacks are pending.

This make us able to get rid of to xxx_to_do variables in `final.c`
and `memprof.c`. The variable is reset to 0 when entering
`caml_check_urgent_gc`, which is now the main entry point for
asynchronous callbacks. In case a callback raises an exception, we
need to set it back to 1 to make sure no callback is missed.
@jhjourdan jhjourdan force-pushed the no_callback_c_alloc branch from 9ef2889 to 42ab59a Compare June 6, 2019 14:08
@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Oct 1, 2019

Not quite, since finalizers can allocate anyway and trigger polling of signals. So the previous code allowed the call of signal handlers without letting anyone rely on this, since we cannot know in advance whether a finalizer is going to be called or not.

You have a point that finalisers have polled signals themselves. However, this does not make it a polling location, this only reassures that making it a polling point does not provoke a huge breakage. This still changes the behaviour from "once in a blue moon (major gc cycle), we run the risk that something unexpected can happen" to "we are prepared that every time, something unexpected can happen". What you describe is such a rare and unreproducible combination of circumstances, for a kind of bad effect (leak) that can be so hard to notice or trace back to a cause, that I am not entirely convinced that wrong uses of check_urgent_gc have all been reported.

I prefer to see it this way: since signals could only have been polled via finalisers, then we know that no code in the wild relies on signals being polled in check_urgent_gc. Same with memprof callbacks which didn't exist. If we turn it into a polling point now, then we are going to lose this information. So since the general theme of the PR is to provide a better control on when asynchronous callbacks run, then let's add this guarantee to functions that call check_urgent_gc internally (including those in the wild). If there is any issue with the call to finalisers being after all important, then I would have a bigger worry that this also endangers the premiss of this PR: that it it safe to remove calls to finalisers from allocation functions in C.

But if you show me existing code that assume otherwise, I may change my mind.

Does #8897 count?

From what I see, check_urgent_gc is used in the runtime to perform gc-related operations, such as just after creating many major-to-minor pointers. That's also true of lablgtk where it has to do IIRC with making sure that a minor collection has been done. The name also does not really refers to asynchronous callbacks (as objected by @dbuenzli in the other thread). Only with ephemerons I would not exclude that has to do with running finalisers, but if that is the case then we can always replace it with a call to process_pending_events, and then we will have to audit it to make sure that it is robust for asynchronous exceptions across all callers (and for future-proofness, have it return the exception instead of raising it immediately).

In #8805, we would like to share the code for minor allocations in bytecode and native modes. So, the CAML_FROM_CAML flag is necessary in caml_alloc_small_dispatch. Also using it in bytecode mode costs nothing, and makes polling more robust, so why not doing it?

I am not sure if I understand correctly: this is a change unrelated to the topic of the PR, and it has not been described nor discussed in this review, right? I like the direction, this is a prerequisite for bringing polling in bytecode in line with polling in native after the Better Safe Points change—though I cannot comment on whether this is safe for the interpreter (and you have your point that signal handlers already run inside finalisers). But this is not a minor change, and it seems to me that it would have been easy to include it in a separate commit, and clearly describe how and why the behaviour of interpreted programs is changed, so that people know that the behaviour is changing. Also, is it mentioned in the Changes file?

@jhjourdan
Copy link
Copy Markdown
Contributor Author

jhjourdan commented Oct 1, 2019

From what I see, check_urgent_gc is used in the runtime to perform gc-related operations, such as just after creating many major-to-minor pointers. That's also true of lablgtk where it has to do IIRC with making sure that a minor collection has been done. The name also does not really refers to asynchronous callbacks (as objected by @dbuenzli in the other thread). Only with ephemerons I would not exclude that has to do with running finalisers, but if that is the case then we can always replace it with a call to process_pending_events, and then we will have to audit it to make sure that it is robust for asynchronous exceptions across all callers (and for future-proofness, have it return the exception instead of raising it immediately).

Alright, I did a search on GitHub, and it appears indeed that the uses of these functions in the wild are rare (I found that only in LablGTK and OCamlnet), and those uses are only in allocation related functions, and not really for the purpose of polling e.g., signals. That said, it seems like in both cases it is safe to run async callbacks from caml_check_urgent_gc.

So, I agree with you that, for the sake of uniformity with other allocation functions in the OCaml runtime, we could disable async callback polling in caml_check_urgent_gc by default. Of course, this means that we should use the polling-enabled function everywhere in the runtime it is safe to do so.

Perhaps the simplest solution is that you simply change the default behavior of caml_check_urgent_gc in #8993? That way, we keep using caml_process_pending_events wherever it is safe.

I am not sure if I understand correctly: this is a change unrelated to the topic of the PR, and it has not been described nor discussed in this review, right?

You are right that this has not been discussed much, and that this change has not been documented well. I now remember better the reason: even though the design has been chosen to make #8805 easy, there is a more fundamental reason which is specific to this patch: when sampling and allocation, we want, if possible, to call the Memprof callback as soon as possible. One very common case is when allocating in the minor heap from OCaml code, and, in this case, we can arrange to call the callback immediately during the allocation. Now, if we allow running a Memprof callback in this context, then there is no reason we don't allow any callback.

I like the direction, this is a prerequisite for bringing polling in bytecode in line with polling in native after the Better Safe Points change—though I cannot comment on whether this is safe for the interpreter (and you have your point that signal handlers already run inside finalisers). But this is not a minor change, and it seems to me that it would have been easy to include it in a separate commit, and clearly describe how and why the behaviour of interpreted programs is changed, so that people know that the behaviour is changing. Also, is it mentioned in the Changes file?

Do you really think one can easily see the difference? Remember that we do polling in bytecode mode at any function call and loop iteration! Also, do you think that nowadays many people write native-incompatible OCaml code? I can add a * in Changes for this, but really it seems like a minor change to me, introducing an extremely minor incompatibility.

@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Oct 1, 2019

Perhaps the simplest solution is that you simply change the default behavior of caml_check_urgent_gc in #8993? That way, we keep using caml_process_pending_events wherever it is safe.

I was thinking about that. I will take care of it.

Now, if we allow running a Memprof callback in this context, then there is no reason we don't allow any callback.

Relatedly, a very concrete question I have (see #8961 and 2b1bdd4) is whether you see a use in being able to raise from a memprof callback, and if not, whether it would be preferable to completely forbid any exception arising from them. The reason is that there is almost certainly going to be a distinction between raising signal handlers and non-raising ones, in order to be able to yield() properly inside resource acquisition and release functions (I will have to ask the language designers if they have other solutions in mind, but I am quite confident that his is the way to go–please go to #8961 if you wish to contribute to the discussion). The answer is going to determine whether memprof callbacks can run inside acquisition and release functions: if we forbid exceptions arising from memprof callbacks, then we can call them during (and therefore profile) acquisition and release of resources (cf. the Fun.with_resource prototype #8962). If so this has to be decided for 4.10, and:

  • implemented in 4.10, in the form of documentation saying that it is forbidden to raise inside them, to avoid a future breaking change,
  • later on, when needed, they would be changed to run inside an uncaught exception handler and with asynchronous callbacks delayed, similarly to how destructors are treated in Fun.with_resource.

Another possible answer would be that it is not that bad to delay memprof callbacks, and that we do not care that much (including about callbacks delayed by the thread yielding).

Do you really think one can easily see the difference? Remember that we do polling in bytecode mode at any function call and loop iteration! Also, do you think that nowadays many people write native-incompatible OCaml code? I can add a * in Changes for this, but really it seems like a minor change to me, introducing a extremely minor incompatibility.

It probably deserves an entry in the Changes file because is it a separate change, not sure about the '*' though: I agree that if correct, then the change is probably unnoticeable for the user. I expressed myself poorly, I was saying that people who develop OCaml can be interested in keeping track of the semantics of the language. For instance, people involved in BSP can be interested in knowing that there is a better opportunity now to have the same polling behaviour in native and bytecode. (I am so interested in that because it is important for the implementation of resource-management primitives: at the moment I cannot reason about polling points in bytecode, so the implementation is at best brittle, and at worst broken without me knowing and without solutions to fix it.)

@jhjourdan
Copy link
Copy Markdown
Contributor Author

If so this has to be decided for 4.10, and: implemented in 4.10, in the form of documentation saying that it is forbidden to raise inside them, to avoid a future breaking change,

That's something we need to discuss a little more, and I am afraid this is not going to be ready for 4.10. My gut feeling for now is that I don't think exceptions are useful in memprof callbacks, but they may raise OOM/stack overflow exceptions that the user may want to handle...

Anyway, Memprof is unlikely to be ready for 4.10, and we will probably do large changes to its API (see #8920) before its release. So, as discussed with @gasche, what we will probably do is simply remove the Gc.Memprof module in the 4.10 branch (or at the very least mark it as experimental and liekly to change in the near future). So, don't worry too much about that.

jhjourdan added a commit to jhjourdan/ocaml that referenced this pull request Oct 2, 2019
@jhjourdan
Copy link
Copy Markdown
Contributor Author

It probably deserves an entry in the Changes file because is it a separate change,

See #9007

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Oct 2, 2019

Anyway, Memprof is unlikely to be ready for 4.10

We were really rather hoping to get Memprof into 4.10. I don't think that a decision about what the behaviour should be when a memprof callback raises an exception is needed in the first version of the feature. We can easily change such behaviour later and anyone whose code was relying on the previous behaviour deserves to have their code break.

jhjourdan added a commit to jhjourdan/ocaml that referenced this pull request Oct 2, 2019
jhjourdan added a commit to jhjourdan/ocaml that referenced this pull request Oct 2, 2019
@gasche
Copy link
Copy Markdown
Member

gasche commented Oct 2, 2019

There is a lot more work to be done to upstream the entirety of memprof than this specific discussion. Even if we assume that @stedolan will magically do all of the review and design API discussions, 4.10 sounds like a very tight deadline (and also fairly demanding for Jacques-Henri's volunteer time). It does sound more reasonable to me to (do our best to have statmemprof advance at a steady pace, but) not try to force it into 4.10.

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Oct 2, 2019

It does sound more reasonable to me to [...] not try to force it into 4.10.

Sure, if its not ready in time then its not ready. I just wanted to make sure that it didn't get delayed by a fairly hypothetical discussion about exceptions coming out of the callbacks.

gasche added a commit that referenced this pull request Oct 2, 2019
@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Oct 2, 2019

No problem. There seems to be an agreement that the user should not rely on exceptions coming out of memprof callbacks, but that we have more time on our hands to discuss the implications.

@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Oct 8, 2019

I am still trying to understand what is the semantics for calling check_urgent_gc (the one that runs callbacks) from caml_alloc_small_dispatch in the interpreter.

In #8805, we would like to share the code for minor allocations in bytecode and native modes.

there is a more fundamental reason which is specific to this patch: when sampling and allocation, we want, if possible, to call the Memprof callback as soon as possible. One very common case is when allocating in the minor heap from OCaml code, and, in this case, we can arrange to call the callback immediately during the allocation. Now, if we allow running a Memprof callback in this context, then there is no reason we don't allow any callback.

With the current PR, check_urgent_gc is only called after a GC (i.e. when young_ptr exceeds young_trigger), unlike in native where this includes the cases where young_ptr exceeds memprof_young_trigger or young_alloc_end (in the latter case when a signal or finaliser is pending, a minor or major gc has been requested, or an asynchronous callback has just raised an exception). But, in addition to the call to check_urgent_gc after a GC, caml_alloc_small_dispatch also calls memprof callbacks, and only them, when young_ptr exceeds memprof_young_trigger. (Let me know if you think I may be misunderstanding the code.)

Would it not be more consistent to always call other callbacks alongside memprof callbacks? Also it would be more consistent with native to always call check_urgent_gc at the end of caml_alloc_small_dispatch in the interpreter so that it includes the case young_alloc_end. (In particular there would be no need to treat the interpreter specially inside caml_memprof_track_young.) I see that there is some complicated business inside caml_memprof_track_young and so that it may not be that simple.

I am not saying that there is a problem with the current code, I am trying to follow your motivations for the change.

I noticed a leftover caml_memprof_to_do variable which is only ever set to 0 and never read, can you please have a look? In addition, should it not set caml_something_to_do in case there are pending memprof callback? (It is fine right now, but I might have to do this for what I do with it.) (sorry for that one, I misread the code)

@jhjourdan
Copy link
Copy Markdown
Contributor Author

With the current PR, check_urgent_gc is only called after a GC (i.e. when young_ptr exceeds young_trigger), unlike in native where this includes the cases where young_ptr exceeds memprof_young_trigger or young_alloc_end (in the latter case when a signal or finaliser is pending, a minor or major gc has been requested, or an asynchronous callback has just raised an exception). But, in addition to the call to check_urgent_gc after a GC, caml_alloc_small_dispatch also calls memprof callbacks, and only them, when young_ptr exceeds memprof_young_trigger. (Let me know if you think I may be misunderstanding the code.)

You're right. I think there is a bug in caml_alloc_small_dispatch. We should call check_urgent_gc systematically in caml_alloc_small_dispatch. Also, I'm tempted to make signal handling in bytecode mode more uniform with native mode by setting young_limit in both modes. Let me write a PR.

Would it not be more consistent to always call other callbacks alongside memprof callbacks? Also it would be more consistent with native to always call check_urgent_gc at the end of caml_alloc_small_dispatch in the interpreter so that it includes the case young_alloc_end. (In particular there would be no need to treat the interpreter specially inside caml_memprof_track_young.) I see that there is some complicated business inside caml_memprof_track_young and so that it may not be that simple.

caml_memprof_track_young leaves the minor heap in an invalid state until the allocation actually takes place. So, no, we cannot call check_urgent_gc there. And anyway, if we do that, we would not have the guarantee that the minor heap is no full, which is required in Alloc_small. But wait for my PR, this will be something along these lines.

I noticed a leftover caml_memprof_to_do variable which is only ever set to 0 and never read, can you please have a look?

Well spotted. Fix incoming in the PR.

@jhjourdan jhjourdan mentioned this pull request Oct 9, 2019
gasche added a commit that referenced this pull request Oct 11, 2019
gadmm added a commit to gadmm/ocaml that referenced this pull request Oct 11, 2019
In ocaml#8691, caml_check_urgent_gc was merged with the function that runs
asynchronous callbacks. The rationale was that caml_check_urgent_gc
already runs finalisers, and so could have run any asynchronous
callbacks.

We agreed on a different strategy: we know that users could not rely
on asynchronous callbacks being called at this point, so take the
opportunity to make it callback-safe, like was done for allocation
functions.

The new check_urgent_gc no longer calls finalisers (nor any
callbacks), and instead two functions are introduced:

* caml_process_pending_callbacks : internal function to execute
  callbacks unconditionally.

* caml_process_pending_actions : public-facing function that checks
  for something to do, and executes all actions (GC and callbacks).

Archaeology shows that this matches the behaviour from before the work
started (modulo fewer unpredictable calls to finalisers). In
particular, some calls to caml_check_current_gc are removed because
archaeology showed that it was introduced to replace a direct call to
callbacks.
gadmm added a commit to gadmm/ocaml that referenced this pull request Oct 11, 2019
In ocaml#8691, caml_check_urgent_gc was merged with the function that runs
asynchronous callbacks. The rationale was that caml_check_urgent_gc
already runs finalisers, and so could have run any asynchronous
callbacks.

We agreed on a different strategy: we know that users could not rely
on asynchronous callbacks being called at this point, so take the
opportunity to make it callback-safe, like was done for allocation
functions.

The new check_urgent_gc no longer calls finalisers (nor any
callbacks), and instead two functions are introduced:

* caml_process_pending_callbacks : internal function to execute
  callbacks unconditionally.

* caml_process_pending_actions : public-facing function that checks
  for something to do, and executes all actions (GC and callbacks).

Archaeology shows that this matches the behaviour from before the work
started (modulo fewer unpredictable calls to finalisers). In
particular, some calls to caml_check_current_gc are removed because
archaeology showed that it was introduced to replace a direct call to
callbacks.
gadmm added a commit to gadmm/ocaml that referenced this pull request Oct 11, 2019
In ocaml#8691, caml_check_urgent_gc was merged with the function that runs
asynchronous callbacks. The rationale was that caml_check_urgent_gc
already runs finalisers, and so could have run any asynchronous
callbacks.

We agreed on a different strategy: we know that users could not rely
on asynchronous callbacks being called at this point, so take the
opportunity to make it callback-safe, like was done for allocation
functions.

The new check_urgent_gc no longer calls finalisers (nor any
callbacks), and instead two functions are introduced:

* caml_process_pending_callbacks : internal function to execute
  callbacks unconditionally.

* caml_process_pending_actions : public-facing function that checks
  for something to do, and executes all actions (GC and callbacks).

Archaeology shows that this matches the behaviour from before the work
started (modulo fewer unpredictable calls to finalisers). In
particular, some calls to caml_check_current_gc are removed because
archaeology showed that it was introduced to replace a direct call to
callbacks.
gadmm added a commit to gadmm/ocaml that referenced this pull request Oct 11, 2019
In ocaml#8691, caml_check_urgent_gc was merged with the function that runs
asynchronous callbacks. The rationale was that caml_check_urgent_gc
already runs finalisers, and so could have run any asynchronous
callbacks.

We agreed on a different strategy: we know that users could not rely
on asynchronous callbacks being called at this point, so take the
opportunity to make it callback-safe, like was done for allocation
functions.

The new check_urgent_gc no longer calls finalisers (nor any
callbacks), and instead two functions are introduced:

* caml_process_pending_callbacks : internal function to execute
  callbacks unconditionally.

* caml_process_pending_actions : public-facing function that checks
  for something to do, and executes all actions (GC and callbacks).

Archaeology shows that this matches the behaviour from before the work
started (modulo fewer unpredictable calls to finalisers). In
particular, some calls to caml_check_current_gc are removed because
archaeology showed that it was introduced to replace a direct call to
callbacks.
gadmm added a commit to gadmm/ocaml that referenced this pull request Oct 13, 2019
In ocaml#8691, caml_check_urgent_gc was merged with the function that runs
asynchronous callbacks. The rationale was that caml_check_urgent_gc
already runs finalisers, and so could have run any asynchronous
callbacks.

We agreed on a different strategy: we know that users could not rely
on asynchronous callbacks being called at this point, so take the
opportunity to make it callback-safe, like was done for allocation
functions.

The new check_urgent_gc no longer calls finalisers (nor any
callbacks), and instead two functions are introduced:

* caml_process_pending_callbacks : internal function to execute
  callbacks unconditionally.

* caml_process_pending_actions : public-facing function that checks
  for something to do, and executes all actions (GC and callbacks).

Archaeology shows that this matches the behaviour from before the work
started (modulo fewer unpredictable calls to finalisers). In
particular, some calls to caml_check_current_gc are removed because
archaeology showed that it was introduced to replace a direct call to
callbacks.
gadmm added a commit to gadmm/ocaml that referenced this pull request Oct 14, 2019
In ocaml#8691, caml_check_urgent_gc was merged with the function that runs
asynchronous callbacks. The rationale was that caml_check_urgent_gc
already runs finalisers, and so could have run any asynchronous
callbacks.

We agreed on a different strategy: we know that users could not rely
on asynchronous callbacks being called at this point, so take the
opportunity to make it callback-safe, like was done for allocation
functions.

The new check_urgent_gc no longer calls finalisers (nor any
callbacks), and instead two internal functions are introduced:

* caml_process_pending_callbacks : function to execute callbacks
  unconditionally.

* caml_check_urgent_gc_and_callbacks : function that checks for
  something to do, and then executes all actions (GC and callbacks).
@stedolan stedolan mentioned this pull request Jun 29, 2020
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants