Skip to content

Reserve only caml_minor_heap_max_wsz * Max_domains for the minor heap#11082

Merged
kayceesrk merged 1 commit intoocaml:trunkfrom
sabine:change_minor_heap_size
Mar 28, 2022
Merged

Reserve only caml_minor_heap_max_wsz * Max_domains for the minor heap#11082
kayceesrk merged 1 commit intoocaml:trunkfrom
sabine:change_minor_heap_size

Conversation

@sabine
Copy link
Copy Markdown
Contributor

@sabine sabine commented Mar 3, 2022

This addresses #10971 in the sense that

  • we do no longer reserve a huge, fixed-size-per-domain amount of memory for the minor heaps, instead the initial minor heap reservation is s * Max_domains
  • domains can use Gc.set to increase their minor heap size beyond the reservation (causes STW) or modify their minor heap size within the current reservation (doesn't cause STW)
  • the size of the reservation is always caml_minor_heap_max_wsz * Max_domains

It's now possible to use tools like AFL and Valgrind. The changes to Gc.set do not break existing code.

There is the possibility of a different pull request to add an OCAMLRUNPARAM option to set Max_domains, but we intentionally did not include this here to keep the patch minimal.

summary

When a domain uses Gc.set to increase its own minor heap size to be larger
than the current reservation, we reserve a new memory area for
all minor heaps. This is done inside a STW section:

  1. every domain frees its own heap,
  2. inside a global barrier, the final domain unreserves the old minor heap
    area, then reserves the new minor heap area. This domain also adjusts
    the minor heap area pointers of all domains while the other domains
    wait for the end of the barrier,
  3. every domain allocates its own heap with size Caml_state->minor_heap_wsz.

Co-authored-by: Gabriel Scherer gabriel.scherer@gmail.com

@xavierleroy
Copy link
Copy Markdown
Contributor

Looks nice, thanks! However the tests fail because you have a bunch of C functions that are not prefixed by caml_ and should therefore be declared static. Can you please sort this out?

@gasche
Copy link
Copy Markdown
Member

gasche commented Mar 4, 2022

This PR contains 20% of the feature set discussed in #10971 and 80% of the difficulty: we don't handle changing the maximum number of domains here, but we still have to un-map and re-map minor heaps at runtime, which is the hard part. We hope to be able to add other features easily on top. (@sabine and myself have prototyped those additional features, but we went back to the bare minimum when we had to track a segfault by staring inquisitively at the diff.)

In the middle of a Stop-The-World (STW) section, we unmap and then re-map the minor heap space, and we modify the dom_internal structure of all domains (whether they are registered as STW participants or not) to store the new boundaries of each domain reserved space (domain_self->minor_heap_area, domain_self->minor_heap_area_end). The fact that this works is not obvious, it relies on STW synchronization guarantees that are currently being documented in #11072.

When @sabine and myself wrote the code of the present PR, I was assuming that no code touching the domain structures could not run in parallel with STW sections (before a STW barrier), and that this justified the correctness of our change. In fact the assumption, as discussed in #11072, is not entirely correct: domain initialization code cannot run in parallel with STW sections, but domain cleanup code can.
Fortunately, neither the domain initialization nor the domain cleanup code currently mutate the minor heap reservation or its boundaries domain_self->minor_heap_area{,_end}. The variables are read (not mutated) if a domain decides to change its actual (committed) minor heap size, which only occurs through Gc.set in mutator code that cannot run in parallel with STW sections.

All this to say: reviews are warmly welcome!

(cc @Engil, who did a pre-review already, and @ctk21 who is most probably interested in looking at this.)

@gasche gasche force-pushed the change_minor_heap_size branch from aaab6c4 to 1c814d4 Compare March 4, 2022 21:14
@abbysmal
Copy link
Copy Markdown
Contributor

Sorry for the delay in reviewing this: I will be taking a look at this PR very shortly, at the very least by the end of Monday I will have a review for it.

Thank you very much for this work!

Copy link
Copy Markdown
Contributor

@ctk21 ctk21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good feature that fixes some real problems with large virtual allocations. I believe the code to work as intended.

I was surprised that a new STW section was introduced rather than just adding the functionality onto the end of the existing minor heap collection; you always have to empty the minor heaps to resize. However the implementation here works fine.

I see one possibility that might simplify things and deals with the FIXME in the code: I need convincing that we really need to expose both caml_set_minor_heap_size and caml_update_minor_heap_max.

Copy link
Copy Markdown
Contributor

@abbysmal abbysmal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this PR @sabine, @gasche !
I already did a preview a few weeks back with @gasche, and the changes looks good to me.
I have a few minor nitpicks; however I think the STW segment is pretty straightforward and has no flaws I can see.

As for the whether or not adding a new STW segment is preferable to reusing one: I tend to be a neutral about the notion. (at least here.)
We made similar changes when dealing with DLABs and I was not necessarily fond of extending an already complex STW segment (minor collection) to support resizing/slabbing of the minor segments.

I think the use case here is fine, and the general world of advice for dealing with STW segments should be to avoid touching or adding them as much as possible. (which I think @ctk21 has stated many times in the past :-))

Food for thoughts as well would be could this segment be reused for later extension of the minor heap allocator (the return of slabbing?).
Or would there be a mix of things between this new STW segment and the regular minor collection segment?
I think if we were to have a fancier, more dynamic allocator for the minor heap, if STW is required, reusing this segment rather than using the minor heap collection one should be fine.

Lastly, another item to ponder upon one day: I see @sabine and @gasche are also now enjoying the virtue of the burn-style tests! (a true Multicore staple.)
I wonder if we should one day factor some common bits in the ones we have, if only to have a streamlined control interface.
(which is a throwback to some discussions around the testsuite and parallel testcases)

runtime/domain.c Outdated
void* heaps_base;
/* minor heap initialization and resizing */

static uintnat get_minor_heap_reservation_size() {
Copy link
Copy Markdown
Contributor

@kayceesrk kayceesrk Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two places where get_minor_heap_reservation_size is used, the result is immediately passed to reserve_minor_heaps. Best to push get_minor_heap_reservation_size code into reserve_minor_heaps and delete that function.

@sabine sabine force-pushed the change_minor_heap_size branch 6 times, most recently from 5c94e6a to 3375448 Compare March 18, 2022 07:55
@sabine sabine force-pushed the change_minor_heap_size branch from 7cc6fdf to f231d7b Compare March 21, 2022 10:48
@ctk21
Copy link
Copy Markdown
Contributor

ctk21 commented Mar 21, 2022

My thinking was the second ("modify the code of the minor gc"); the argument for this would be that it keeps all the minor heap code in one place and we would just check to see if the resize code path is needed at the point we hit the minor GC collection barrier (that is the point that all domains signal they have collected their minor heap). An argument against is that the STW minor GC collection function is big and hairy.

It seems like you considered this in your design and decided what you are doing is better. That's fine!

@kayceesrk kayceesrk added this to the 5.0.0 milestone Mar 22, 2022
Copy link
Copy Markdown
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still unsure what needs to be done to get this PR merged. At any rate here are silly comments on the test and the Changes entry. I'm not sure we need a Changes entry, but if we have one it should explain the changes w.r.t. 4.14, not w.r.t. Multicore OCaml.

@sabine sabine force-pushed the change_minor_heap_size branch from f231d7b to 72e3a3c Compare March 24, 2022 18:17
@sabine
Copy link
Copy Markdown
Contributor Author

sabine commented Mar 24, 2022

Yes, a Changes entry should not be needed because none of the possible issues arising from large heap reservation were ever exposed to the user. All we're doing here is to fix the memory reservation/allocation strategy in such a way that it does what the user already expects (i.e. minimal amount of reservation and allocation).

Looking at 4.14 as the reference point for the Changes entry is the correct thing to do. Thank you for reminding, it's something we need to take into account for all the multicore-related PRs.

Copy link
Copy Markdown
Contributor

@abbysmal abbysmal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a last look at this PR and besides one final nitpick to the eventlog_metadata.in file I think things looks good here.

When a domain uses Gc.set to increase its own minor heap size to be larger
than the current reservation, we reserve a new memory area for
all minor heaps. This is done inside a STW section:

1. every domain frees its own heap,
2. inside a global barrier, one domain unreserves the old minor heap
area, then reserves the new minor heap area. This domain also adjusts
the minor heap area pointers of all domains while the other domains
wait for the end of the barrier,
3. every domain allocates its own heap with size `Caml_state->minor_heap_wsz`.

Co-authored-by: Gabriel Scherer <gabriel.scherer@gmail.com>
@sabine sabine force-pushed the change_minor_heap_size branch from 72e3a3c to ffda59d Compare March 28, 2022 06:50
@kayceesrk kayceesrk merged commit d5de6aa into ocaml:trunk Mar 28, 2022
@kayceesrk
Copy link
Copy Markdown
Contributor

Thanks, @sabine and @gasche for the work, and all the reviewers for their careful reviews.

sabine added a commit to sabine/ocaml that referenced this pull request Mar 29, 2022
Every call to `Gc.set` that modifies the minor heap size causes
a stop-the-world section, in order to adjust the minor heaps of all
domains.

On `domain_terminate()`, we free the minor heap to maintain the
invariant "only active domains have a minor heap allocated".

Resulting simplifications: we no longer keep track of individual
domains' minor heap size - instead, we have a global variable
`caml_minor_heap_wsz` for the size of the minor heap, and
every domain allocates a minor heap of this size.

This patch resolves the FIXME from ocaml#11082 relating to the duplicate
reallocation of the leader domain's minor heap.
sabine added a commit to sabine/ocaml that referenced this pull request Mar 29, 2022
Every call to `Gc.set` that modifies the minor heap size causes
a stop-the-world section, in order to adjust the minor heaps of all
domains.

On `domain_terminate()`, we free the minor heap to maintain the
invariant "only active domains have a minor heap allocated".

Resulting simplifications: we no longer keep track of individual
domains' minor heap size - instead, we have a global variable
`caml_minor_heap_wsz` for the size of the minor heap, and
every domain allocates a minor heap of this size.

This patch resolves the FIXME from ocaml#11082 relating to the duplicate
reallocation of the leader domain's minor heap.
sabine added a commit to sabine/ocaml that referenced this pull request Mar 29, 2022
Every call to `Gc.set` that modifies the minor heap size causes
a stop-the-world section, in order to adjust the minor heaps of all
domains.

On `domain_terminate()`, we free the minor heap to maintain the
invariant "only active domains have a minor heap allocated".

Resulting simplifications: we no longer keep track of individual
domains' minor heap size - instead, we have a global variable
`caml_minor_heap_wsz` for the size of the minor heap, and
every domain allocates a minor heap of this size.

This patch resolves the FIXME from ocaml#11082 relating to the duplicate
reallocation of the leader domain's minor heap.
sabine added a commit to sabine/ocaml that referenced this pull request Mar 29, 2022
Every call to `Gc.set` that modifies the minor heap size causes
a stop-the-world section, in order to adjust the minor heaps of all
domains.

On `domain_terminate()`, we free the minor heap to maintain the
invariant "only active domains have a minor heap allocated".

Resulting simplifications: we no longer keep track of individual
domains' minor heap size - instead, we have a global variable
`caml_minor_heap_wsz` for the size of the minor heap, and
every domain allocates a minor heap of this size.

This patch resolves the FIXME from ocaml#11082 relating to the duplicate
reallocation of the leader domain's minor heap.
sabine added a commit to sabine/ocaml that referenced this pull request Mar 29, 2022
Every call to `Gc.set` that modifies the minor heap size causes
a stop-the-world section, in order to adjust the minor heaps of all
domains.

On `domain_terminate()`, we free the minor heap to maintain the
invariant "only active domains have a minor heap allocated".

Resulting simplifications: we no longer keep track of individual
domains' minor heap size - instead, we have a global variable
`caml_minor_heap_wsz` for the size of the minor heap, and
every domain allocates a minor heap of this size.

This patch resolves the FIXME from ocaml#11082 relating to the duplicate
reallocation of the leader domain's minor heap.
sabine added a commit to sabine/ocaml that referenced this pull request Mar 29, 2022
Every call to `Gc.set` that modifies the minor heap size causes
a stop-the-world section, in order to adjust the minor heaps of all
domains.

On `domain_terminate()`, we free the minor heap to maintain the
invariant "only active domains have a minor heap allocated".

Resulting simplifications: we no longer keep track of individual
domains' minor heap size - instead, we have a global variable
`caml_minor_heap_wsz` for the size of the minor heap, and
every domain allocates a minor heap of this size.

This patch resolves the FIXME from ocaml#11082 relating to the duplicate
reallocation of the leader domain's minor heap.
sabine added a commit to sabine/ocaml that referenced this pull request Mar 29, 2022
Every call to `Gc.set` that modifies the minor heap size causes
a stop-the-world section, in order to adjust the minor heaps of all
domains.

On `domain_terminate()`, we free the minor heap to maintain the
invariant "only active domains have a minor heap allocated".

Resulting simplifications: we no longer keep track of individual
domains' minor heap size - instead, we have a global variable
`caml_minor_heap_wsz` for the size of the minor heap, and
every domain allocates a minor heap of this size.

This patch resolves the FIXME from ocaml#11082 relating to the duplicate
reallocation of the leader domain's minor heap.
sabine added a commit to sabine/ocaml that referenced this pull request Mar 29, 2022
Every call to `Gc.set` that modifies the minor heap size causes
a stop-the-world section, in order to adjust the minor heaps of all
domains.

On `domain_terminate()`, we free the minor heap to maintain the
invariant "only active domains have a minor heap allocated".

Resulting simplifications: we no longer keep track of individual
domains' minor heap size - instead, we have a global variable
`caml_minor_heap_wsz` for the size of the minor heap, and
every domain allocates a minor heap of this size.

This patch resolves the FIXME from ocaml#11082 relating to the duplicate
reallocation of the leader domain's minor heap.
sabine added a commit to sabine/ocaml that referenced this pull request Mar 29, 2022
Every call to `Gc.set` that modifies the minor heap size causes
a stop-the-world section, in order to adjust the minor heaps of all
domains.

On `domain_terminate()`, we free the minor heap to maintain the
invariant "only active domains have a minor heap allocated".

Resulting simplifications: we no longer keep track of individual
domains' minor heap size - instead, we have a global variable
`caml_minor_heap_wsz` for the size of the minor heap, and
every domain allocates a minor heap of this size.

This patch resolves the FIXME from ocaml#11082 relating to the duplicate
reallocation of the leader domain's minor heap.
sabine added a commit to sabine/ocaml that referenced this pull request Mar 29, 2022
Every call to `Gc.set` that modifies the minor heap size causes
a stop-the-world section, in order to adjust the minor heaps of all
domains.

On `domain_terminate()`, we free the minor heap to maintain the
invariant "only active domains have a minor heap allocated".

Resulting simplifications: we no longer keep track of individual
domains' minor heap size - instead, we have a global variable
`caml_minor_heap_wsz` for the size of the minor heap, and
every domain allocates a minor heap of this size.

This patch resolves the FIXME from ocaml#11082 relating to the duplicate
reallocation of the leader domain's minor heap.
sabine added a commit to sabine/ocaml that referenced this pull request Mar 29, 2022
Every call to `Gc.set` that modifies the minor heap size causes
a stop-the-world section, in order to adjust the minor heaps of all
domains.

On `domain_terminate()`, we free the minor heap to maintain the
invariant "only active domains have a minor heap allocated".

Resulting simplifications: we no longer keep track of individual
domains' minor heap size - instead, we have a global variable
`caml_minor_heap_wsz` for the size of the minor heap, and
every domain allocates a minor heap of this size.

This patch resolves the FIXME from ocaml#11082 relating to the duplicate
reallocation of the leader domain's minor heap.
sabine added a commit to sabine/ocaml that referenced this pull request Mar 29, 2022
Every call to `Gc.set` that modifies the minor heap size causes
a stop-the-world section, in order to adjust the minor heaps of all
domains.

On `domain_terminate()`, we free the minor heap to maintain the
invariant "only active domains have a minor heap allocated".

Resulting simplifications: we no longer keep track of individual
domains' minor heap size - instead, we have a global variable
`caml_minor_heap_wsz` for the size of the minor heap, and
every domain allocates a minor heap of this size.

This patch resolves the FIXME from ocaml#11082 relating to the duplicate
reallocation of the leader domain's minor heap.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants