Trigger a minor GC when custom blocks accumulate in minor heap by alainfrisch · Pull Request #1476 · ocaml/ocaml

alainfrisch · 2017-11-13T14:52:08Z

Attempt to address MPR#7100.

When a custom block is allocated with caml_alloc_custom, one specifies
a ratio (mem/max) which is used to increase the pressure on the major
GC. But before this PR, there was no logic to put some pressure on
the minor GC. Consequently, a trivial loop that allocates rather
large bigarrays and throw them away immediately will trigger out-of-memory
on systems with constrained memory, and in particular on 32-bit systems.

Before OCaml 4.03, custom objects with finalizers (which include bigarrays)
were never allocated in the minor heap, thus reducing the risk of hitting
the problem.

This PR replicates the logic used to trigger the major GC: when
the sum of ratios corresponding to custom blocks allocated in the
minor heap reaches 1, a minor collection is forced.

This also fix MPR#7671, although this
is still mysterious to me (since the repro case for that one forced a GC compact before the final allocation that triggered the OOM).

When a custom block is allocated with caml_alloc_custom, one specifies a ratio (mem/max) which is used to increase the pressure on the major GC. But before this commit, there is no logic to put some pressure on the minor GC. Consequently, a trivial loop that allocates rather large bigarrays and throw them away immediately will trigger out-of-memory on systems with constrained memory, and in particular on 32-bit systems. Before OCaml 4.03, custom objects with finalizers (which include bigarrays) were never allocated in the minor heap, thus reducing the risk of hitting the problem. This commit replicates the logic used to trigger the major GC: when the sum of ratios corresponding to custom blocks allocated in the minor heap reaches 1, a minor collection is forced.

nojb · 2017-11-13T21:52:57Z

byterun/custom.c

      add_to_custom_table (&caml_custom_table, result, mem, max);
+      /* Keep track of extra resources held by custom block in
+         minor heap. */
+      caml_extra_heap_resources_minor += (double) mem / (double) max;


I am not knowledgeable enough to opine on the proposed strategy, but shouldn't we protect against invalid values of mem and max ? (e.g. max == 0)

I've added some protection against mem = 0 (just an optimization) and max = 0, setting it to 1 instead as caml_adjust_gc_speed does. That function also truncates mem to be <= max, but this seems useless (it would be, strictly speaking, if the comparison for the threshold were >= 1. instead of > 1). Note that both mem and max are unsigned integer, so no need to deal with negative values.

max = 0, setting it to 1

Actually, this might be useless as well (just let the ratio be +infinity, which would trigger the minor gc immediately; this is also the case if max = 1, mem > 0 anyway).

alainfrisch · 2017-11-20T10:06:04Z

FTR, @lpw25 suggested to take the counter for custom blocks in the major heap into account as well to trigger the minor GC (i.e. consider the sum of ratios for custom blocks in the minor+major heaps). The change in the code is trivial. I'll let @damiendoligez comment on the preferred heuristics.

damiendoligez · 2017-12-13T10:59:17Z

I don't think we should take the major heap counter into account, since it won't be reset by a minor collection. If we do, then when that counter is almost full we'll do more minor collections than necessary.

alainfrisch · 2017-12-13T11:06:48Z

@damiendoligez Apart from this point, do you have an opinion on the overall proposal?

damiendoligez

The change looks good in principle, but you are using a deprecated function to trigger the minor GC.

damiendoligez · 2017-12-13T11:06:33Z

byterun/caml/minor_gc.h

                                     asize_t, asize_t);
 extern void caml_oldify_one (value, value *);
 extern void caml_oldify_mopup (void);
+extern void caml_minor_collection (void);


You shouldn't export this function, it's deprecated.

damiendoligez · 2017-12-13T11:07:25Z

byterun/custom.c

+        if (max == 0) max = 1;
+        caml_extra_heap_resources_minor += (double) mem / (double) max;
+        if (caml_extra_heap_resources_minor > 1.0) {
+          caml_minor_collection();


You should call

caml_request_minor_gc (); caml_gc_dispatch ();

as done in array.c to trigger a minor collection.

lpw25 · 2017-12-13T12:09:09Z

I don't think we should take the major heap counter into account, since it won't be reset by a minor collection. If we do, then when that counter is almost full we'll do more minor collections than necessary.

Whilst just taking the major counter into account causes this bad behaviour, it still seems wrong to me that you can now use up twice the user specified limit before trying to collect. Is there nothing we can do to try and obey the user's limit?

The pathological case is when 99% of the limit is used up on the major heap and 1% on the minor heap -- which if we just take both limits into account will cause us to do a minor collection for every new 1% of the limit allocated. What is the behavior we actually want in this case? Clearly doing a major collection could free up more of the limit and allow for longer between minor collections, but how do we make that trade-off.

One possibility might be to try to keep the ratio between major limit and minor limit proportional to the relative sizes of the major and minor heaps.

gasche · 2017-12-13T14:57:13Z

(I have not been involved in this discussion previously because I didn't understand some basic things, one of them corresponding to Damien's remark above: I didn't understand why people wanted to involve the major heap limit in deciding to make minor collections.)

One possibility might be to try to keep the ratio between major limit and minor limit proportional to the relative sizes of the major and minor heaps.

This would make the minor limit very small in practice in most cases, meaning that it would be hard to allocate custom blocks without doing a lot more minor collections.

alainfrisch · 2017-12-13T15:22:32Z

it still seems wrong to me that you can now use up twice the user specified limit before trying to collect

Considering that the current situation is much worse than that (since 4.03 at least) without producing many complaints, I'm tempted to believe that this factor of 2 is ok.

gasche · 2017-12-13T15:31:23Z

Is there nothing we can do to try and obey the user's limit?

Some design space:

you fix the minor limit at (limit - major custom space); but that risks doing minor collections too often when (major custom space) is close to the limit
you could fix the minor limit proportionally to the respective size of the minor heaps and major heaps; but I think this is even worse
if the major custom space is (m * limit) with (m < 1), you could have every (m) limit-reaching event cause a major collection; but that would cause too much major collections
you could keep the running sum of total minor-heap space swept over by limit-reaching minor collections, and do a major collection when this equals the major heap space; this may still cause a lot of extra minor collections if major collections don't collect any custom space (if it is really long-lived) -- in that case it basically degenerates to (1)

It sounds much simpler to have separate minor custom limit and major custom limit, and let users choose both, as a more fine-grained alternative to picking only a total limit. We could pick a default ratio between the two (say, 1/5), and by default set the two so that they sum up to the current total limit.

gasche · 2017-12-13T15:37:01Z

Refined proposal: we could have a "global limit", and then for each heap a "soft limit". We only force a collection when the total custom space reaches the global limit, but in that case we collect all the heaps that have reached their soft limit.

This way,

the semantics of the global limit is still preserved
workloads that use only the minor heap (short-lived custom blocks) or only the major heap (large custom blocks) see no difference compared to having a single global limit

A default choice for the minor and major soft limits still needs to be decided.

gasche · 2018-01-28T15:22:13Z

I'm surprised by the absence of an enthusiastic response to my "one hard limit, two soft limits" proposal, which still sounds excellent to me :-) @alainfrisch, @damiendoligez, @lpw25, are you less convinced? Should I try to propose a PR?

alainfrisch · 2018-02-28T12:58:34Z

I'm surprised by the absence of an enthusiastic response to my "one hard limit, two soft limits" proposal, which still sounds excellent to me :-)

Honestly, I've no strong opinion. It might be better than the proposed approach, but it's also more complex and it changes the public API. The current state of the PR is, I believe, a strict improvement over the current situation, it fixes issues observed in the wild, and it has been more or less blessed by @damiendoligez . I think we should merge, and if time permits to find something even better before the next release, we can do that in another PR.

lpw25 · 2018-02-28T13:17:22Z

I think we should merge, and if time permits to find something even better before the next release

I'm fine with that.

I'm surprised by the absence of an enthusiastic response to my "one hard limit, two soft limits" proposal ... are you less convinced?

That sounds like a good approach to me.

alainfrisch · 2018-03-01T16:55:24Z

@gasche Are you also ok with the incremental approach?

@damiendoligez I've addressed the "requested change" (not using a deprecated function). Can you review again?

gasche · 2018-03-01T17:17:11Z

I certainly agree that consuming, at most, twice the desired size (which is the aspect of the proposed behavior which is disappointing) is better than OOM-ing.

Note: the soft-limit does not need to change the public API; for example, we can set 1/3 for both soft limits ratios, and we get something essentially similar to your proposal (except that workflows that allocate on both minor and major heaps collect more often to stay below 1.0 utilization, instead of having 2.0 utilization in the worst case).

alainfrisch · 2018-03-06T22:44:23Z

Gentle ping @damiendoligez

gasche · 2018-03-06T23:47:34Z

byterun/custom.c

+          caml_request_minor_gc ();
+          caml_gc_dispatch ();
+        }
+      }


I think it could be nice to encapsulate this logic in a separate function like caml_adjust_gc_speed (caml_adjust_minor_gc_speed?). Also, comparing with the major-heap function suggest that maybe including CAML_INSTR_* code could be useful.

damiendoligez · 2018-03-28T14:44:34Z

Refined proposal: we could have a "global limit", and then for each heap a "soft limit". We only force a collection when the total custom space reaches the global limit, but in that case we collect all the heaps that have reached their soft limit.

You can't decide to collect the major heap because the major GC is incremental so the major heap is being collected continuously. You need to decide how much work the major GC will do for each word allocated by the program. I'm not sure how that would work with your soft limit idea.

…#1476) When a custom block is allocated with caml_alloc_custom, one specifies a ratio (mem/max) which is used to increase the pressure on the major GC. But before this commit, there is no logic to put some pressure on the minor GC. Consequently, a trivial loop that allocates rather large bigarrays and throw them away immediately will trigger out-of-memory on systems with constrained memory, and in particular on 32-bit systems. Before OCaml 4.03, custom objects with finalizers (which include bigarrays) were never allocated in the minor heap, thus reducing the risk of hitting the problem. This commit replicates the logic used to trigger the major GC: when the sum of ratios corresponding to custom blocks allocated in the minor heap reaches 1, a minor collection is forced.

alainfrisch requested a review from damiendoligez November 13, 2017 14:52

nojb reviewed Nov 13, 2017

View reviewed changes

Protect against max = 0.

533d79a

damiendoligez requested changes Dec 13, 2017

View reviewed changes

Do not rely on deprecated function.

b66de35

alainfrisch added 2 commits February 28, 2018 14:02

Fix.

6ac3389

Changelog.

cc80a9b

alainfrisch added the bug label Feb 28, 2018

Merge branch 'trunk' into m7100_gc_when_many_custom_blocks_in_minor_heap

7f3e230

alainfrisch assigned damiendoligez Mar 1, 2018

gasche reviewed Mar 6, 2018

View reviewed changes

damiendoligez approved these changes Mar 28, 2018

View reviewed changes

damiendoligez merged commit c0c08f0 into trunk Mar 28, 2018

mshinwell mentioned this pull request Apr 6, 2018

Fix bug in GPR#1476 #1701

Merged

damiendoligez mentioned this pull request Apr 25, 2018

Add caml_alloc_custom_mem #1738

Merged

alainfrisch deleted the m7100_gc_when_many_custom_blocks_in_minor_heap branch July 2, 2018 06:53

gasche mentioned this pull request Aug 16, 2018

[RFC] Add Bigstring module #1961

Closed

This was referenced Mar 14, 2019

Bigarray's caml_ba_alloc doesn't try GC if malloc fails #7100

Closed

Change of behavior in Gc.major_slice in 4.03 #7180

Closed

Another Strange Out of memory using Bigarray #7671

Closed

Conversation

alainfrisch commented Nov 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nojb Nov 13, 2017

Choose a reason for hiding this comment

Uh oh!

alainfrisch Nov 14, 2017

Choose a reason for hiding this comment

Uh oh!

alainfrisch Nov 14, 2017

Choose a reason for hiding this comment

Uh oh!

alainfrisch commented Nov 20, 2017

Uh oh!

damiendoligez commented Dec 13, 2017

Uh oh!

alainfrisch commented Dec 13, 2017

Uh oh!

damiendoligez left a comment

Choose a reason for hiding this comment

Uh oh!

damiendoligez Dec 13, 2017

Choose a reason for hiding this comment

Uh oh!

damiendoligez Dec 13, 2017

Choose a reason for hiding this comment

Uh oh!

lpw25 commented Dec 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gasche commented Dec 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alainfrisch commented Dec 13, 2017

Uh oh!

gasche commented Dec 13, 2017

Uh oh!

gasche commented Dec 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gasche commented Jan 28, 2018

Uh oh!

alainfrisch commented Feb 28, 2018

Uh oh!

lpw25 commented Feb 28, 2018

Uh oh!

alainfrisch commented Mar 1, 2018

Uh oh!

gasche commented Mar 1, 2018

Uh oh!

alainfrisch commented Mar 6, 2018

Uh oh!

gasche Mar 6, 2018

Choose a reason for hiding this comment

Uh oh!

damiendoligez commented Mar 28, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

alainfrisch commented Nov 13, 2017 •

edited

Loading

lpw25 commented Dec 13, 2017 •

edited

Loading

gasche commented Dec 13, 2017 •

edited

Loading

gasche commented Dec 13, 2017 •

edited

Loading