Add a runtime flag to call abort(3) instead of raising Out_of_memory#1926
Add a runtime flag to call abort(3) instead of raising Out_of_memory#1926jorisgio wants to merge 1 commit intoocaml:trunkfrom
Conversation
|
I think we have previously discussed permanently changing this exception into a fatal error. I'm not sure it should be calling |
|
Without looking at the details, I think it is a good idea to revisit
I don't think so - these exceptions signal programmer error, are perfectly well defined, and they can be handled in the normal flow of a program, so they should not abort the program run. |
It's often harder to debug a weird error without a core dump to look at... |
I agree. One could have a similar discussion for Stack_overflow and "asynchronous" exceptions (raised from signal handlers or finalizers), although for asynchronous exceptions, one should be able to register a global handler as an OCaml function (which could decide to exit the application, or just to log the error and continue). |
|
I'm not convinced by the motivations. Catch-all exceptions should never be used except for finalization (followed by a reraise) or in the entry point of a program for hardening purposes (log then restart). Turning a run-time exception into an abort() is just making the program unable to handle the exceptional condition. Something is lost. What is gained? |
|
I strongly support this. |
Programs are already unable to handle allocation failure as I don't think this option should be added. Instead, out of memory should always be a fatal error and |
|
Related discussion on https://caml.inria.fr/mantis/view.php?id=7185 , about avoiding the fatal error case. |
Crash consistency.
In an ideal world, yes, but in practice it's hard to control all dependencies and I believe it makes sense to have catch all clauses in "safe" points to log and restart only part of the program. The main problem is that most exceptions signal local errors. In concurrent or multi thread programs, Another problem is that catching Out_of_memory gives you a false sense of security, and i've encountered cases which were hard to understand simply from the trace : if you are lucky, a gc will manage to release enough memory to unwind all the stack, run all finalization code between reraise, and exit. Basically i agree that it's not strictly necessary to have the option to abort in the runtime, and we discussed another possibility of writing a ppx to rewrite catchall clauses into catchall-except OOM and abort, but having the option to directly abort on oom would be much easier.
Maybe but i can still see where OOM can be useful : for instance when allocating a large array/bigarray and having this allocation fail, it's contained and easy to catch enough for programmer to try handling it. |
|
What I dislike about calling |
This does sound useful, but I think that the Out_of_memory exception is a poor way to express it (being asynchronous and unreliable). I still think we should remove it, but to support your use-case it might be worth adding |
damiendoligez
left a comment
There was a problem hiding this comment.
I've marked one line in memory.c where raising the exception is dependent on the GC's internal state. This is the one that makes Out_of_memory unreliable, and which should be changed to a fatal error.
Elsewhere, I think it does make sense to raise Out_of_memory and it's easy to recover from it:
- when creating a huge array, bigarray, or bytes
- when changing the minor heap size
- when marshalling or unmarshalling a huge value
So we should keep the exception in these cases.
At any rate, I don't think it's a good idea to introduce a run-time option to control this behavior. @jorisgio would you be willing to simplify your PR accordingly?
| free_extern_output(); | ||
| caml_raise_out_of_memory(); | ||
| } | ||
|
|
There was a problem hiding this comment.
This change is not correct. If you're in "raise OOM" mode, then you need to replay the trail and free the output before you can raise the exception.
runtime/memory.c
Outdated
| if (caml_init_out_of_memory_mode == 0) { | ||
| caml_raise_out_of_memory(); | ||
| } else { | ||
| abort(); |
There was a problem hiding this comment.
You should print an error message before aborting, like we do everywhere else when we use abort.
runtime/memory.c
Outdated
| @@ -498,7 +507,7 @@ static inline value caml_alloc_shr_aux (mlsize_t wosize, tag_t tag, | |||
| else if (caml_in_minor_collection) | |||
There was a problem hiding this comment.
This is the line that makes Out_of_memory unreliable. You should remove the test and always call caml_fatal_error in this case.
1002842 to
b19c879
Compare
|
Thanks. I've simplified to the one line changed you suggested, i'm going to test in the wild too. I'm not totally convinced it's a good idea to raise Out_of_memory while allocating a large string/array, but i believe your suggestion is a good middle ground and is indeed the biggest issue so i'm happy with that solution. The other change with caml_fatal_error is that it will not abort() but exit() and this seems to be the consensus. I'd rather have a coredump to debug this sort of issue, but i guess i'm heavily biased towards servers and network services development where the dev is also the user. |
|
Hmm, I think maybe |
IIUC |
FWIW, I agree completely with this. This makes |
|
Some programs, such as libvirt, depend on handling out-of-memory situations. Libvirt, in particular, includes OCaml code that would be broken by this change ― it catches |
|
@DemiMarie Could you provide a pointer to the code in libvirt which does this? |
It is interesting to read about the handling of OOM in other languages, and in particular the reasons why Rust has decided go from aborting to giving an option to raise an exception (https://github.com/rust-lang/rfcs/blob/master/text/2116-alloc-me-maybe.md). This RFC and the two linked discussions are worth reading (as well as several earlier threads linked from there; it's a bit long, but in any case I read it for you). They refute several misconceptions about the difficulty of handling OOM safely, and they contain several convincing examples of use-cases. A recurring example of use-case is that of servers (mentioned also by @jorisgio), where one is ready to have a request be aborted to let others continue; this is made safe thanks to the isolation of requests. The embedded use-case is also mentioned, but for this they prefer In Rust, the exception-safety model is the one @xavierleroy alluded to:
In other words, it asks for resource safety ( Here compositional means that (by analogy) one can write catch-alls in the middle of a program, not really anywhere like a normal Now in OCaml, an additional difficulty is that it is harder to control allocations. Let us focus on the use-cases where OOM is caused by trying to allocate too much (as opposed to being low on memory due to outside pressure). A difference is that in functional programming, it is more likely that a large allocation consists in a lot of small allocations. This is the situation in the use-case that @mlasson describes in MPR#7185. In contrast, the current proposed solution only caters to situations where OOM is caused by a possibly large allocation (this is also mentioned as a limitation by the Rust RFC for the non-raising half of their proposal). Moreover:
I think this is an important point.
The first step would also be responsible for allocating the landing pad evoked by Jacques-Henri as a solution for MPR#7185. |
|
Nine months later, this PR is coming back in radicalized form as #8628 ("Deprecate the |
|
I hope that I did not derail this PR by thinking out loud about an ideal design (that no programming language currently comes close to implement). |
|
this was superseded by #8630 |
|
I suggest we have a second look at the idea of adding a new runparam that causes fatal_error on OOM. I tried to understand the effects of the current patch that proposes to always stop with fatal_error in A new runparam to turn off the Out_of_memory exception would cater the needs of developers who just want to assume that memory allocation does not fail. I argue that such a switch is justified because you would want it even if the Out_of_memory exception was properly implemented. Indeed, even then, graceful handling of memory exhaustion is not automatic and is likely to require additional care from the developer. But in many situations, developers are in a model where it is assumed that allocations never fail (witness overcommitting). Having a way to choose whether to immediately abort on OOM has precedents in Rust, C++, and some malloc implementations. I refer to the experience of Rust developers (https://github.com/rust-lang/rfcs/blob/master/text/2116-alloc-me-maybe.md):
For OCaml, it is desirable to leave the current behaviour to raise Out_of_memory as a default, e.g. for the case of the REPL user. It fixes the current issue without requiring a breaking change. It does not make the Out_of_memory exception reliable, which is a problem left open. @jorisgio, do you like the idea and do you want to reopen the PR and update your former patch? Or I can do it myself in a new PR if nobody is against. |
|
Personally i got hooked it by this proposal of having a global handler that by default raises abort but a user can register a callback to customize behavior. It looks like rust lang item feature, which is nice to use. The good thing about it is that you can log custom info about heap and program specific data before aborting. |
|
Since #8630 has been merged, it is understood that fatal_error already includes such a hook. It remains to call fatal_error on out_of_memory. Note, given that fatal_error_hook is called during a failed minor gc, it should probably be documented that the hook should not call the ocaml runtime. |
Currently, if allocation fails, runtime raises
Out_of_memoryexception. This exception is almost impossible to handle properly as it can be raised from virtually any allocation point, and will be caught by any catch all clauses.This pr adds an OCAMLRUNPARAM option to turn this exception into an abort call.
Alternatives:
Concerns
Some library features are well contained and it might be worth keeping the out_of_memory behavior for them, for instance bigarray creation or marshal.
Status
this PR is wip but i'd like to know if this approach looks right before adding tests etc.