Make Minor_heap_max and Max_domains OCAMLRUNPARAM options#10955
Make Minor_heap_max and Max_domains OCAMLRUNPARAM options#10955sabine wants to merge 22 commits intoocaml:trunkfrom
Conversation
…zed in caml_init_gc
…init_domains use local variables to calculate sizes
…f max_domains and minor_heap_max_wsz
| case 'M': scanmult (opt, ¶ms.init_custom_major_ratio); break; | ||
| case 'm': scanmult (opt, ¶ms.init_custom_minor_ratio); break; | ||
| case 'n': scanmult (opt, ¶ms.init_custom_minor_max_bsz); break; | ||
| case 'N': scanmult (opt, ¶ms.minor_heap_max_wsz); break; |
There was a problem hiding this comment.
Isn't there an interaction between N and s parameters? For soundness, s <= N must hold. I see three options here:
- The runtime can raise an exception if
s <= Ndoesn't hold. Makes it difficult for users. - Silently bump
Nto be at least as much ass. Choose the default values such asN >= s. - Remove the
Nparameter altogether and only haves. InterpretN == s. The downside here is that the size of the minor heap cannot be programmatically increased during runtime from its initial value since the minor heap is already at the maximum size. It can only be decreased. FWIW, depending on the default value forNin option 2, it will also suffer from the same problem.
If it is not common to increase the size of the minor heap during execution, I would go for 3 as it has one fewer option, and is less confusing for the users.
There was a problem hiding this comment.
Searching the Internet, I can't seem to find any discussions where people suggest increasing the minor heap size during execution. I only saw people recommending to increase the minor heap size using the s option.
Are there other ways we could go about checking? Is it possible to search all opam packages for code that increases the size of the minor heap during execution?
Clearly, 3 has the least potential for confusing users.
There was a problem hiding this comment.
I have a program that has command-line parameters to tune the gc control knobs (see https://github.com/Gbury/dolmen/blob/7cd59bbeaa536056d19facdbdc724b104e891732/src/bin/options.ml#L252-L262 ,
https://github.com/Gbury/dolmen/blob/7cd59bbeaa536056d19facdbdc724b104e891732/src/bin/options.ml#L316 , and https://github.com/Gbury/dolmen/blob/7cd59bbeaa536056d19facdbdc724b104e891732/src/bin/options.ml#L369-L401 ), but i'ts mainly because I find it easier to use command line arguments rather than env variables, and because once upon a time, I ran some benchmarks to tweak the default parameters a bit. On that note, there's also at least Coq that increases the minor heap size at startup (but I think it falls in the definition of during execution), see https://github.com/coq/coq/blob/07537426ad106c721e67d513c6db2e9047131e7b/sysinit/coqinit.ml
There was a problem hiding this comment.
CompCert also increases minor heap size at start-up:
https://github.com/AbsInt/CompCert/blob/85b1c4091e13dec13fe03f28e81b256c60f9f7ef/driver/Driver.ml#L403-L406
You could OPAM-grep for Gc.set: I suspect many uses of this function tweak the minor heap size.
There was a problem hiding this comment.
I'm running the OPAM-grep, and so far, it looks like you are correct:
Using Gc.set to set the minor heap size is something that people commonly do.
There was a problem hiding this comment.
Changing heap sizes throughout runtime has been proposed/discussed too: https://sympa.inria.fr/sympa/arc/caml-list/2010-11/msg00323.html. I have sometimes had success with similar techniques, but would not suggest them as a general-purpose default.
|
The You may want to increase the ocaml/testsuite/tests/parallel/domain_parallel_spawn_burn.ml Lines 18 to 24 in bd015d0 You can specify the |
Co-authored-by: KC Sivaramakrishnan <kc@kcsrk.info>
| @@ -3,6 +3,7 @@ | |||
| include unix | |||
| ** bytecode | |||
There was a problem hiding this comment.
Perhaps you may need a separate ocamlrunparam for bytecode too? @shindere.
There was a problem hiding this comment.
locally, the test case passes 20 times in a row
There was a problem hiding this comment.
Welcome to multicore hacking :-) The github actions VMs have quite good at smoking out non-determinism bugs. I still don't know whether the ocamlrunparam as written applies only to the native runs or both of them.
|
I was thinking about these pesky programs that change minor heap size programmatically. Maybe they are wrong (the runtime system knows best what the good size is) and should be told to stop doing that, or their requests to change size should be ignored. But assuming we want to keep supporting this idiom, here is one possible way to go about it. All minor heaps should reside in a range of memory addresses that contain no other data, so that the "is young" test can be implemented as two pointer comparisons. Let's call this range of addresses "the minor area". Currently, "the minor area" is structured as N blocks of size S, where N is the max number of domains and S the max size for minor heaps, as fixed once and for all (before this PR) or determined when the runtime system starts (this PR). But many other arrangements can be considered, including dynamic allocation within the minor area... The one parameter that must be determined when the runtime system starts is the total size of the minor area, so that an address range of this size can be reserved. After this, we can have a trivial dynamic allocator that implements malloc and free requests. These Additional benefit: with a given total minor area size, we can have one or a few domains with a huge minor heap, or a boatload of domains with small minor heaps... No need to decide in advance. |
I understand that @sadiqj has experimented with various allocation strategies for the minor heaps that would be more flexible, but it was found to perform noticeably worse than the current approach, for unclear reasons (possibly NUMA effects). See the post-mortem at https://github.com/ocaml-multicore/ocaml-multicore/wiki/Domain-Local-Allocation-Buffers-Addendum |
I forgot about that, thanks for the pointer! However, his schema is more ambitious than mine. With his approach, chunks of the minor area could be acquired by a domain, then released at the next minor GC, then reacquired by another domain. This is a very clever way to address the case of domains having drastically different allocation rates. But I'm not surprised NUMA systems react badly to chunks of memory being "moved" from one core to another this way. My proposal is much much more modest: every domain still owns a fixed subset of the minor area; it's just that this subset is determined dynamically at domain creation time rather than being partitioned a priori. Maybe that's still well suited to NUMA ? |
If I understand this correctly there may still be an issue after minor heap resizing that a domain ends up with some memory that belonged to a different domain prior to resize and may be on a different core/socket. One option to avoid this (if we expect resizing minor heaps to be an infrequent operation) might be to munmap and mmap the minor heap? I also wasn't 100% sold that the DLABs work @gasche mentioned was all NUMA issues, there seemed to be some weird cache effects going on that we didn't understand. |
I would expect minor heap resizing to occur very rarely (like, once at the beginning of the program), so this may not be much of a concern. On the other hand, unmapping then remapping could be a good opportunity to change mapping flags and e.g. force the memory to be allocated for good? (I'm probably talking nonsense here). On the third hand, I'm not sure that unmapping then remapping always succeeds... |
| free_domain_ml_values(p.ml_values); | ||
| caml_failwith("failed to allocate domain"); | ||
| caml_failwith("failed to allocate domain. " | ||
| "You can set the 'Max_domains' OCAMLRUNPARAM parameter " |
There was a problem hiding this comment.
Should this be "max_domains (d)". See https://github.com/ocaml/ocaml/pull/10955/files#diff-315666779650ff6271e537465a48cd484e1f4671fafbfaa2ed22207a0c0aa038R132.
One concern I have with such a dynamic scheme is that we may observe non-deterministic domain creation failures. For example, consider for the sake of argument that we create a 6 MB minor heap area to be shared among all of the domains. Let us assume that each domain starts with a default 2 MB minor heap. Consider the following program: let main () =
let d1 = Domain.spawn (fun _ ->
Gc.set { (Gc.get()) with Gc.minor_heap_size = 4194304 (* 4M *) })
in
let d2 = Domain.spawn (fun _ -> ()) in
Domain.join d1;
Domain.join d2The spawn of This PR proposes a simpler API. You can have It is a bit unsatisfying that Coq will fatal error if the default
is not so bad with Overall, I feel that this PR is proposing useful improvements
I am keen to get to a state where the PR is forwards compatible with the experiments that we would like to do on alternate minor heap designs, and provides a simple API to the users. |
Somewhat naive question: why does the address range need to be reserved once and for all at the beginning? I would have thought that a new larger minor area could be mapped dynamically as the required size grows, doing a minor collection when it is, since there aren't any pointers into the minor area that survive a minor collection. |
I agree. Resize is a rare operation, and its performance is not a particular concern. On a resize request, we may perform a minor collection, and then in a stop-the-world section Potentially, the same idea can work for the case when there are no free domain slots during a domain spawn. |
Agree. This was my suggestion earlier up in the thread, we don't necessarily need to keep the same minor heap range as long as it's done during the stw after the minor heap has been collected. |
This is not acceptable. Tweaking OCAMLRUNPARAM is OK for developers but not for end-users. (Plus, Windows users don't even know how to set an environment variable.) Standalone applications such as Coq or CompCert must be able to set their run-time parameters (minor heap size, max number of domains, etc) programmatically, at the beginning of execution. It's OK if it can only be done before the first domain is spawned. Several ways to do this have been proposed in this PR. Why don't we agree first on functionality, then on a design, and only then submit and review PRs ? |
|
Clearly, it is unacceptable to break existing programs by requiring an environment variable to be set for them. What if we keep the default It has become clear during the discussion that, moving forward, there is potential in exploring a better solution for giving the programmer control over the virtual memory reservations performed by the runtime. I'm closing this in favor of continuing the discussion on #10971. |
|
KC Sivaramakrishnan (2022/01/28 06:41 -0800):
@kayceesrk commented on this pull request.
> @@ -3,6 +3,7 @@
include unix
** bytecode
Welcome to multicore hacking :-) The github actions VMs have quite
good at smoking out non-determinism bugs. I still don't know whether
the `ocamlrunparam` as written applies only to the native runs or both
of them.
It applies the same way as the OCAMLRUNPARAM environment variable does,
I think.
|
|
Thanks for the clarification @shindere. |
This implements ocaml-multicore/ocaml-multicore#795.
Minor_heap_maxandMax_domainsbecome runtime parameters set through OCAMLRUNPARAM. Manual entries added.Multiple fixed-size static arrays of length
Max_domainsturned into array pointers. Allocations happen at program startup based oncaml_params->max_domains.New default of
16forMax_domainsis proposed. Default forMinor_heap_maxnot changed from 256k.Valgrind runs with
OCAMLRUNPARAM="d=15", but fails with default settings:Fatal error: Not enough heap memory to start up.For running AFL, we need to use a lower number of domains to stay below AFL's 50MB virtual memory limit, or use the
-m noneoption. Updated the manual to add this information. Modified AFL testcase to use OCAMLRUNPARAMS instead of the-m noneoption.Discuss: