Reorder call to caml_reset_young_limit and tighten assertion#12875
Reorder call to caml_reset_young_limit and tighten assertion#12875mshinwell wants to merge 1 commit intoocaml:trunkfrom
Conversation
|
Ah yes - I meant to mention statmemprof. I'm not sure what the situation is here - @NickBarnes should be able to advise. |
|
@damiendoligez your thoughts would be welcomed! |
|
#12879 also relates to the hook mentioned here, fwiw. |
|
This looks right to me - certainly we should reset the I am less clear that we should rule out equality in the assertion, but I think that we only reset the limit at times when we know what (*) More exactly, |
- Fix two bugs reported by Mark Shinwell: ocaml#12879 and ocaml#12875 - Reasoning about multithreaded code: it is now safe to assume that no mutation happens on the same domain after allocating on the OCaml from C. (This reverts to OCaml 4 behaviour.)
|
My interpretation of this issue and the one at #12879 is that one should definitely not call Note that in OCaml 4, systhread preemption used to be implemented with signals, and thus delayed to a safe point, as in #12882. This also ensured some property for reasoning in FFI code about concurrent mutations. Thanks for the digging up these issues. |
OK, the invariant for calling Also, I am all for adding assertions that helped catch bugs in practice, but the bug is that |
gadmm
left a comment
There was a problem hiding this comment.
I propose to remove what is related to the assertion tightening, to have the PR included quickly as a fix.
I am curious about continuing the discussion for clarifying invariants about when caml_reset_young_limit could/should be called.
|
I agree with everyone that moving the call to As for the assertion, I don't think you can tighten it safely. As hinted by @NickBarnes and @gadmm, I think that would interfere with statmemprof because an assignment to Consider the following scenario:
While I haven't looked at the new statmemprof code, the invariant here is really that |
- Fix two bugs reported by Mark Shinwell: ocaml#12879 and ocaml#12875 - Reasoning about multithreaded code: it is now safe to assume that no mutation happens on the same domain after allocating on the OCaml from C. (This reverts to OCaml 4 behaviour.)
|
I think @damiendoligez's analysis is right. |
Here is a nice straightforward one for the New Year!
In this PR on the Flambda 2 repo, which is 5.1.1 plus a few backports, I describe a bug where this assertion fails:
Note that this branch does not contain #12742 which changed this assertion to be
>=. Neither does it contain #12824 which made the same change after #12382 inadvertently reverted it to>.I believe the bug happens because
caml_domain_external_interrupt_hook, which can allocate e.g. ifsysthreadsis used, is called beforecaml_reset_young_limit. Please see the first link above for more details. This PR aims to fix the bug ontrunkin the same way.Now, from what I understand from @jmid , #12742 was introduced to fix the above assertion failure which had also been seen (suspiciously!) only when running
systhreadstests. Unfortunately I suspect this was the wrong diagnosis and it was actually the problem with the hook being seen. (The equalities between the various GC pointers aren't the same as in my first PR above at the point of failure, but I think it is very likely another form of the same problem.)Let's suppose for a moment it isn't the hook at fault and examine the situation in #12742 again. For
caml_poll_gc_workto setyoung_trigger = young_startitself and to haveyoung_ptr = young_triggerat the time of the assertion, it seems like you need an allocation large enough such that it is performed when the trigger is still at the midpoint (otherwisecaml_poll_gc_workwould have forced a minor collection), but it fills the heap entirely (and exactly). (Call this Condition 1.) I don't believe that could happen with the parameters given on #12742: the maximum object size is 256 words + 1 header word, so that isn't capable of overflowing from the first half of the heap to exactly the very end, if there are 1024 words in the heap.I discussed this with @jmid who reported today: "To further confirm, this morning I cloned a fresh 5.1.1, reverted #12742 and moved
caml_reset_young_limitas in PR flambda-backend/2204 and reran themulticoreteststest suite without any issues (well, technically there was an instance of a very long running test but that is an existing, known issue)".Given this, and the fact that the strict
>in the assertion has caught an obscure bug, I would prefer to see this assertion reverted to>ontrunk. That is assuming my reasoning is correct, of course, which it may well not be.One problem remains: what happens if the minor heap is so small that an allocation can produce Condition 1 above? I think this might mean that the assertion can be reached correctly with the equality holding. What I've done in this PR is to simply disallow this situation by increasing the minimum size of the minor heap. It is still very small (512 words) which seems fine, even for test cases that deliberately reduce the heap size -- but to my mind, it makes this extraordinarily subtle code rather easier to reason about.