Skip to content

Fixes to #8691#9027

Merged
gasche merged 5 commits intoocaml:trunkfrom
jhjourdan:fix_8691
Oct 11, 2019
Merged

Fixes to #8691#9027
gasche merged 5 commits intoocaml:trunkfrom
jhjourdan:fix_8691

Conversation

@jhjourdan
Copy link
Copy Markdown
Contributor

Thanks to @gadmm, we found a bug in #8691 (cf #8691 (comment)): async callbacks were not checked systematically in caml_alloc_small_dispatch, despite our desire.

Fixing this will be especially important once #8805 will be merged, since caml_alloc_small_dispatch will be the only entry point for handling async callbacks in native mode. Without this PR patch but with #8805, signal handling in native mode will be delayed until the next minor collection.

In addition, this PR fixes two other less important issues in #8691:

  • it gets rid of caml_memprof_to_do and caml_final_to_do, which was used in old versions of the PR Guarantee that allocation functions do no trigger callbacks when called from C #8691,
  • it makes the async callback mechanism use young_limit both in native and bytecode modes. This makes it more uniform, and since we are checking for async callbacks in caml_alloc_small_dispatch, there is no reason not to set young_limit when there is one pending.

@jhjourdan
Copy link
Copy Markdown
Contributor Author

Cc @gadmm, @stedolan.

Copy link
Copy Markdown
Contributor

@gadmm gadmm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Using young_limit uniformly is nice, masking will rely on that. A small question again about memprof_track_young below.

You mention caml_final_to_do but I didn't see it appear in the patches.

This PR is needed for #8805 but technically not for 4.10 given that the current behaviour, although it is not the one desired, is not problematic. However, if we want to avoid delaying #8993, I would need to know if the current PR is going to be merged very very soon because I'll need to rebase on it. (Same remark for the reversion in semantics of check_urgent_gc which is due for 4.10 in all cases and that I'll need to do separately if #8993 was to be delayed.)

@jhjourdan
Copy link
Copy Markdown
Contributor Author

You mention caml_final_to_do but I didn't see it appear in the patches.

Ah right. I forgot to push that.

@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Oct 9, 2019

You mention caml_final_to_do but I didn't see it appear in the patches.

Ah right. I forgot to push that.

look good to me

@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Oct 9, 2019

A behaviour I noticed before but which is repeated with this patch: in long-running allocating C functions, since allocations do not reset caml_something_to_do, if a signal arrives or a finaliser is pending, then all subsequent small allocations are made slow because they will all poll and then decide to not do anything (until the callbacks are finally executed). Is this tradeoff accepted?

My implementation of masking tries hard to avoid that so maybe we can fix this in the future with masking.

@stedolan
Copy link
Copy Markdown
Contributor

stedolan commented Oct 9, 2019

This patch looks good to me. Well spotted/fixed!

It worries me that there is no automated testing of this. I don't think this should be too hard to write a test case for: do something which causes a callback to become pending, and then allocate and assert that the callback runs before the next GC cycle. There's an example of such a test in testsuite/tests/c-api/alloc_async.

A behaviour I noticed before but which is repeated with this patch: in long-running allocating C functions, since allocations do not reset caml_something_to_do, if a signal arrives or a finaliser is pending, then all subsequent small allocations are made slow because they will all poll and then decide to not do anything (until the callbacks are finally executed). Is this tradeoff accepted?

My implementation of masking tries hard to avoid that so maybe we can fix this in the future with masking.

FWIW, I think this overhead is fine. If there's a simple way to avoid it, then that would be nice, but I don't think it's worth spending much effort or runtime complexity on.

@jhjourdan
Copy link
Copy Markdown
Contributor Author

A behaviour I noticed before but which is repeated with this patch: in long-running allocating C functions, since allocations do not reset caml_something_to_do, if a signal arrives or a finaliser is pending, then all subsequent small allocations are made slow because they will all poll and then decide to not do anything (until the callbacks are finally executed). Is this tradeoff accepted?

Well, yes, I thought that should be the price to pay. Do you have figures in order to evaluate the slowdown?

My implementation of masking tries hard to avoid that so maybe we can fix this in the future with masking.

Still, we would need to enter caml_alloc_small_dispatch, which sometimes requires registering extra roots... That still has some costs. Again, figures would help. But that's an independent discussion.

@jhjourdan
Copy link
Copy Markdown
Contributor Author

It worries me that there is no automated testing of this. I don't think this should be too hard to write a test case for: do something which causes a callback to become pending, and then allocate and assert that the callback runs before the next GC cycle. There's an example of such a test in testsuite/tests/c-api/alloc_async.

Then, this test would only be relevant in native mode with #8805 because the bytecode interpreter checks caml_something_to_do directly. And in native mode, I am afraid that the test will fail on some platforms, because young_limit may be cached in a register...

@stedolan
Copy link
Copy Markdown
Contributor

stedolan commented Oct 9, 2019

Then, this test would only be relevant in native mode with #8805 because the bytecode interpreter checks caml_something_to_do directly

Good point. We should wait until #8805 is merged before adding this test.

And in native mode, I am afraid that the test will fail on some platforms, because young_limit may be cached in a register...

This is true for finalisers but not for signals, so the test should use signals. When a signal arrives, the register caching young_limit is updated from the signal handler. I think this is the right behaviour, actually: it's OK to delay a finaliser until the next GC cycle since finalisers are always delayed by a GC cycle, but signals should be handled promptly.

Still, we would need to enter caml_alloc_small_dispatch, which sometimes requires registering extra roots... That still has some costs. Again, figures would help. But that's an independent discussion.

(Agreed that this is an independent discussion, but if you're investigating this you can use CAMLdrop in the style of #8993 (comment) to avoid the overhead of registering roots in the fast case)

@gasche
Copy link
Copy Markdown
Member

gasche commented Oct 9, 2019

I'm all for merging this (@stedolan, I am assuming that you also approve of the patch?), but I think that indeed writing a test would be very nice if it's not too much work.

@stedolan
Copy link
Copy Markdown
Contributor

stedolan commented Oct 9, 2019

I'm all for merging this (@stedolan, I am assuming that you also approve of the patch?), but I think that indeed writing a test would be very nice if it's not too much work.

I agree!

Then, this test would only be relevant in native mode with #8805 because the bytecode interpreter checks caml_something_to_do directly

Thinking about this again, isn't this only true for Memprof callbacks? Signal callbacks work fine before #8805, so a test using those in native code should fail before this PR and pass afterwards.

@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Oct 9, 2019

Then, this test would only be relevant in native mode with #8805 because the bytecode interpreter checks caml_something_to_do directly

Thinking about this again, isn't this only true for Memprof callbacks? Signal callbacks work fine before #8805, so a test using those in native code should fail before this PR and pass afterwards.

Currently (until #8805), in native, caml_garbage_collection correctly calls signal handlers both before and after this PR. IMO this is less of a bugfix than finishing implementing a feature that will only become visible once #8805 lands, in that sense a test isn't strictly necessary, and I agree with @jhjourdan if he says that it's complicated to test.

@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Oct 9, 2019

A behaviour I noticed before but which is repeated with this patch: in long-running allocating C functions, since allocations do not reset caml_something_to_do, if a signal arrives or a finaliser is pending, then all subsequent small allocations are made slow because they will all poll and then decide to not do anything (until the callbacks are finally executed). Is this tradeoff accepted?

Well, yes, I thought that should be the price to pay. Do you have figures in order to evaluate the slowdown?

My implementation of masking tries hard to avoid that so maybe we can fix this in the future with masking.

Still, we would need to enter caml_alloc_small_dispatch, which sometimes requires registering extra roots... That still has some costs. Again, figures would help. But that's an independent discussion.

I agree that it's a separate discussion, and I do not have figures. It is very dependent on the user's code. If the user follows the advice of regularly calling a process_pending_* function, then they are safe from it. The trick with masking in order to avoid entering caml_alloc_small_dispatch is simply not to change young_limit when the mask is on. You remember that a callback is pending with caml_something_to_do, and you use it to set the young_limit when you turn off the mask. I think the same principle can apply for this problem, but we can discuss it again over at #8961 (once it is ready again for review after I rebase it).

@jhjourdan
Copy link
Copy Markdown
Contributor Author

Currently (until #8805), in native, caml_garbage_collection correctly calls signal handlers both before and after this PR. IMO this is less of a bugfix than finishing implementing a feature that will only become visible once #8805 lands, in that sense a test isn't strictly necessary, and I agree with @jhjourdan if he says that it's complicated to test.

Actually, I think we can write a test, which would fail before this PR in bytecode. Just give me a little more time for that!

@jhjourdan jhjourdan force-pushed the fix_8691 branch 2 times, most recently from 44553d0 to f2f36c9 Compare October 11, 2019 09:15
@jhjourdan
Copy link
Copy Markdown
Contributor Author

I have just added a test which checks that signals are indeed handled at every allocation in the minor heap from OCaml code. I took care of not calling any function after raising the signal, so that we are sure that the signal is not handled by the other mechanism in interp.c. The test indeed fails without the current commit.

@gasche gasche merged commit 72869a3 into ocaml:trunk Oct 11, 2019
@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Oct 11, 2019

Thanks!

anmolsahoo25 pushed a commit to anmolsahoo25/ocaml that referenced this pull request Aug 25, 2020
sadiqj pushed a commit to sadiqj/ocaml that referenced this pull request Jan 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants