Skip to content

Ensure the minor heap is actually empty before reallocating it#12879

Merged
NickBarnes merged 1 commit intoocaml:trunkfrom
mshinwell:really-empty-minor-heap
Aug 7, 2025
Merged

Ensure the minor heap is actually empty before reallocating it#12879
NickBarnes merged 1 commit intoocaml:trunkfrom
mshinwell:really-empty-minor-heap

Conversation

@mshinwell
Copy link
Copy Markdown
Contributor

The caml_domain_external_interrupt_hook strikes again. When caml_set_minor_heap_size is called, the minor heap is emptied by calling caml_minor_collection, before the existing minor heap for the domain concerned is freed and reallocated. Except that caml_minor_collection doesn't necessarily leave the heap empty, because of the hook.

To fix this, in this PR we use caml_empty_minor_heaps_once, which doesn't call the hook. The test case that was failing is a bit intermittent but I have reasonable confidence this fix has solved the problem.

I added an assertion at the end of caml_stw_empty_minor_heap_no_major_slice to show that the minor heap is indeed empty at the end of that function, since this isn't that obvious.

This makes me wonder: maybe under the debug runtime, we should make that hook always allocate a small value. At the moment the behaviour is intermittent because it is relying on thread yields.

(Fix originally from oxcaml/oxcaml#2208)

(cherry picked from flambda-backend commit 85a505610bb3b1900d05347651435b323620f48b)
(cherry picked from flambda-backend commit 3583cc128279820f5861b27fb58b095861fc8725)
Copy link
Copy Markdown
Contributor

@NickBarnes NickBarnes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo adding some more assertions. Again (as for #12875) it would be easier to work with or review this code if we had clearer design documentation. I know @stedolan has ideas for restructuring the STW systems to make the constraints clearer. I think that would help here, more separation of concerns. I would like to be able to assume that, after calling a function called something like caml_empty_minor_heaps_once(), the domain's minor heap is empty. But is it? Have finalizers (or memprof callbacks) run, and allocated something? We need to write this stuff down, and have assertions which check it.


CAML_EV_END(EV_MINOR_CLEAR);
caml_gc_log("finished stw empty_minor_heap");
CAMLassert(domain->young_ptr == domain->young_end);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be tempted to put this same assertion in at the end of caml_empty_minor_heaps_once().

gadmm added a commit to gadmm/ocaml that referenced this pull request Jan 5, 2024
- Fix two bugs reported by Mark Shinwell: ocaml#12879 and ocaml#12875
- Reasoning about multithreaded code: it is now safe to assume that no
  mutation happens on the same domain after allocating on the OCaml
  from C. (This reverts to OCaml 4 behaviour.)
Copy link
Copy Markdown
Contributor

@gadmm gadmm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again thanks for digging up these issues. With a more drastic fix I propose at #12882, caml_minor_collection does ensure the assertion domain_state->young_ptr == domain_state->young_end, and so it fixes the current bug in a different way.

However I do not see anything wrong with this change which makes sense (we should remind ourselves to remove the outdated comment at some point if #12882 is accepted).

The added assertions are useful.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jan 5, 2024

As a non-expert, I find it very tricky that caml_minor_collection does not do what one would expect, and I find the approach of #12882 (where it does provide the expected invariant) more tempting, because it would let non-experts like myself contribute to the runtime code safely. I also agree that the extra assertions and explanations are helpful.

The present PR has enough support/reviews to be merged, but I would like to see what people think of #12882 before moving in this direction or rebasing it on top of #1882 #12882.

(Edit: typo fixed, thanks!)

@NickBarnes
Copy link
Copy Markdown
Contributor

I agree that the current behaviour is unclear/surprising ("violates POLA"). @gasche, can you confirm that you mean #12882 rather than #1882 ?

@mshinwell
Copy link
Copy Markdown
Contributor Author

(@gasche typo in your last comment, it's 12882)

I'm also interested in #12882 but will have to take time to look at it carefully. I think it might be better to push the two small fixes for the existing code, before making a larger change, though. (From what I have seen, the two fixes do solve at least the identified problems.)

@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Jan 5, 2024

@mshinwell this makes sense

@NickBarnes
Copy link
Copy Markdown
Contributor

Having looked at #12882 in more detail, I think we should merge both that and this.

gadmm added a commit to gadmm/ocaml that referenced this pull request Feb 9, 2024
- Fix two bugs reported by Mark Shinwell: ocaml#12879 and ocaml#12875
- Reasoning about multithreaded code: it is now safe to assume that no
  mutation happens on the same domain after allocating on the OCaml
  from C. (This reverts to OCaml 4 behaviour.)
@Octachron
Copy link
Copy Markdown
Member

@NickBarnes , it looks that this PR should have been merged and has been forgotten by accident? Am I mistaken?

@NickBarnes
Copy link
Copy Markdown
Contributor

@Octachron I think you're right.

@NickBarnes NickBarnes merged commit b386fd5 into ocaml:trunk Aug 7, 2025
@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Aug 8, 2025

After the merge of #12882, only the two added assertions were still useful, and the added comment is erroneous and should have been removed before merging, as I explained in my comment above. Not sure what happened here but it appears that the PR author asked to delay the merge of #12882 for their review, but the review never arrived and then they never updated us nor this PR. @NickBarnes would you please kindly fix the erroneous comment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants