Use abort instead of exit(2) in caml_fatal_error. by jhjourdan · Pull Request #8630 · ocaml/ocaml

jhjourdan · 2019-04-19T19:19:19Z

This makes it possible to get a core dump in case of a fatal error, which makes debugging easier.

This has been suggested in #8628.

dbuenzli · 2019-04-19T21:12:40Z

A git grep 'exit(' in runtime shows there are a few other places in the runtime system where it exit(2). I think at least one of these can safely be turned into caml_fatal_error: it's the first hit in win32.c. I'm unclear about those in fail_{byt,nat}.c.

The other ones and those calling caml_sys_exit are more tricky since they also involve Stdlib.at_exit handlers which you might not expect to run before an abort(). In particular changing the default behaviour in case of uncaught exception in printexc.c might not be a great idea and this one the user program can at least prevent it from happening and messing with its own exit usage.

jhjourdan · 2019-04-20T11:56:38Z

Well spotted! I turned calls to exit(2) in fail_xxx.c and win32.c into fatal errors. I also fixed one typo and removed spurious calls to exit(-1) in the debugger.

stedolan · 2019-04-23T14:09:42Z

I think this is a good idea: when the OCaml runtime fails, it is better to abort() than to exit with an error code.

However, caml_fatal_error is also used in the bytecode runtime for several more boring errors (e.g. if you run ocamlrun file_that_does_not_exist, you get a fatal error). I don't think these situations should trigger a SIGABRT.

I think the right thing to do here is to have caml_fatal_error call abort, but call exit(N) instead in these situations in runtime/startup_byt.c. I'm not sure what the exit code should be: ocaml currently exits with 2, but I think 127 is a more normal code for when an executable file cannot be found.

jhjourdan · 2019-04-23T16:04:28Z

I think this is a good idea: when the OCaml runtime fails, it is better to abort() than to exit with an error code.

However, caml_fatal_error is also used in the bytecode runtime for several more boring errors (e.g. if you run ocamlrun file_that_does_not_exist, you get a fatal error). I don't think these situations should trigger a SIGABRT.

I think the right thing to do here is to have caml_fatal_error call abort, but call exit(N) instead in these situations in runtime/startup_byt.c. I'm not sure what the exit code should be: ocaml currently exits with 2, but I think 127 is a more normal code for when an executable file cannot be found.

Done.

xavierleroy

I'm OK with the general idea: an internal error in the runtime system should abort() rather than exit(2).

However, not all calls to caml_fatal_error represent internal errors in the runtime system. Some are errors in the OCaml program being executed, and should not abort().

One instance is when the bytecode executable cannot be found, and you already took care of that.

Another instance, detailed below, is when the C code tries to raise an OCaml exception and the OCaml code isn't initialized yet. This case was carefully reported like an uncaught exception, and you turned it into an abort(). This is debatable.

runtime/fail_byt.c

runtime/fail_nat.c

runtime/startup_byt.c

jhjourdan · 2019-05-28T13:01:18Z

Another instance, detailed below, is when the C code tries to raise an OCaml exception and the OCaml code isn't initialized yet. This case was carefully reported like an uncaught exception, and you turned it into an abort(). This is debatable.

As I explained, I think it should be considered UB to call OCaml runtime functions if said runtime is not initialized. Most of the external functions of the OCaml runtime requires it to be initialized anyway (allocation...). I think the test in check_global_data is only there because intern can raise an exception while decoding the bytecode executable.

If you don't agree, then I'll be happy to go back to the previous version (which is just a little less factorized). I just need to know what you think to move forward :)

ygrek · 2019-05-28T14:43:41Z

I'm OK with the general idea: an internal error in the runtime system should abort() rather than exit(2).

ftr now I strongly support @dbuenzli idea of overridable C handler for this kind of unexpected situations. E.g. in our usecase we want to output log in some specific format so that our monitoring tools can track this situation same way as all other logs. Also from practice - we may not always want to do a coredump on oom (this contradicts what I said before, but that's what practice taught me - consuming all IO of a rood disk to get a 200G coredump is not always desirable).

xavierleroy · 2019-05-28T17:27:08Z

The size of core dumps is not a factor here: just use ulimit -c N to limit the max size of core dumps, or ulimit -c 0 to make sure that core dumps are never generated.

This said, I am sympathetic to the idea that users could provide their own handler function instead of the default abort().

jhjourdan · 2019-05-28T17:29:08Z

ftr now I strongly support @dbuenzli idea of overridable C handler for this kind of unexpected situations.

Isn't it possible to install a signal handler for SIGABRT?

xavierleroy · 2019-05-28T17:30:08Z

As I explained, I think it should be considered UB to call OCaml runtime functions if said runtime is not initialized.

This is a reasonable claim that is open for discussion. I still think you need to know why the code you're modifying is written the way it is written, and be aware of what you are changing, rather than changing it without understanding it.

ygrek · 2019-05-28T18:42:17Z

The size of core dumps is not a factor here: just use ulimit -c N to limit the max size of core dumps, or ulimit -c 0 to make sure that core dumps are never generated.

We still want to get (full) coredumps for real segfaults..

Isn't it possible to install a signal handler for SIGABRT?

maybe, but I think explicit handler is better mechanism

dbuenzli · 2019-05-28T19:33:36Z

Isn't it possible to install a signal handler for SIGABRT?

In the handler the context is lost, is the abort coming from OCaml or from C ? If my main is in C I may simply choose to shutdown the OCaml system and restart it.

I think it's a good idea if the OCaml runtime strives to be treated as C library. In other words that it tries as much as possible not to fiddle with the non compositional aspects of the C runtime/standard library.

jhjourdan · 2019-05-30T23:29:10Z

@xavierleroy : alright, I restored the initial code for the three changes you mentioned.

@dbuenzli , @ygrek : your wish is granted. caml_fatal_error now calls caml_fatal_error_hook if not NULL. Note, however, that this hook is not called in case of unhandled exception or failed startup.

xavierleroy

The diff is getting shorter, I like it :-) More seriously: this looks very good to me.

xavierleroy · 2019-06-03T14:25:07Z

Do I understand correctly that the hook mechanism answers @ygrek's concerns about core dumps? Or is there still a problem with standalone applications?

ygrek

I believe this covers all my concerns, yes

Changes

jhjourdan · 2019-06-03T17:59:13Z

I rebased, squashed the commits, and mentionned the new hook in Chances. @xavierleroy : if this looks good to you , then this should be ready to merge.

This makes it possible to get a core dump in case of a fatal error, which makes debugging easier. This also - turns a use of exit(2) to call to caml_fata_error - replaces some uses of caml_fatal_error to a call to exit(127), when the error results in misuse of the runtime - removes some useless calls to exit(-1)

When not NULL, this hook is called when a fatal error is encountered instead of printing an error message on stderr.

jhjourdan · 2019-07-15T08:18:07Z

@xavierleroy: you approved these changes one 1.5 months ago. Is there anything blocking the merge?

xavierleroy · 2019-07-31T09:21:42Z

I guess I was hoping someone else would bother to merge :-)

Let's merge, then.

jhjourdan mentioned this pull request Apr 19, 2019

Deprecate the Out_of_memory exception #8628

Closed

jhjourdan force-pushed the fatal_error_abort branch from 2dcdc7a to 9749f90 Compare April 19, 2019 19:51

jhjourdan force-pushed the fatal_error_abort branch from c621132 to a3ff974 Compare April 20, 2019 12:04

jhjourdan force-pushed the fatal_error_abort branch from a3ff974 to 83df152 Compare April 23, 2019 16:00

xavierleroy reviewed Apr 25, 2019

View reviewed changes

runtime/fail_byt.c Outdated Show resolved Hide resolved

runtime/fail_byt.c Outdated Show resolved Hide resolved

runtime/fail_nat.c Outdated Show resolved Hide resolved

runtime/startup_byt.c Show resolved Hide resolved

jhjourdan force-pushed the fatal_error_abort branch from 83df152 to 77839c9 Compare May 28, 2019 12:46

xavierleroy approved these changes Jun 3, 2019

View reviewed changes

ygrek approved these changes Jun 3, 2019

View reviewed changes

Changes Show resolved Hide resolved

jhjourdan force-pushed the fatal_error_abort branch from d86044b to 1a79cd2 Compare June 3, 2019 17:57

jhjourdan force-pushed the fatal_error_abort branch from 1a79cd2 to d1ea8d5 Compare June 6, 2019 14:31

jhjourdan added 2 commits June 8, 2019 23:04

Add [caml_fatal_error_hook].

cddec18

When not NULL, this hook is called when a fatal error is encountered instead of printing an error message on stderr.

jhjourdan force-pushed the fatal_error_abort branch from d1ea8d5 to cddec18 Compare June 8, 2019 21:06

Merge branch 'trunk' into fatal_error_abort

a121072

Merge branch 'trunk' into fatal_error_abort

9d84841

xavierleroy merged commit 39cb5ad into ocaml:trunk Jul 31, 2019

jorisgio mentioned this pull request Aug 1, 2019

Add a runtime flag to call abort(3) instead of raising Out_of_memory #1926

Closed

gadmm mentioned this pull request Sep 22, 2019

Caml_fatal_user_error #8963

Open

hannesm mentioned this pull request Oct 13, 2019

use different exit codes in argv parsing mirage/functoria#179

Merged

dbuenzli mentioned this pull request Mar 26, 2024

Runtime: abort instead of exit on unimplemented js primitives ocsigen/js_of_ocaml#1590

Merged

Conversation

jhjourdan commented Apr 19, 2019

Uh oh!

dbuenzli commented Apr 19, 2019

Uh oh!

jhjourdan commented Apr 20, 2019

Uh oh!

stedolan commented Apr 23, 2019

Uh oh!

jhjourdan commented Apr 23, 2019

Uh oh!

xavierleroy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jhjourdan commented May 28, 2019

Uh oh!

ygrek commented May 28, 2019

Uh oh!

xavierleroy commented May 28, 2019

Uh oh!

jhjourdan commented May 28, 2019

Uh oh!

xavierleroy commented May 28, 2019

Uh oh!

ygrek commented May 28, 2019

Uh oh!

dbuenzli commented May 28, 2019

Uh oh!

jhjourdan commented May 30, 2019

Uh oh!

xavierleroy left a comment

Choose a reason for hiding this comment

Uh oh!

xavierleroy commented Jun 3, 2019

Uh oh!

ygrek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jhjourdan commented Jun 3, 2019

Uh oh!

jhjourdan commented Jul 15, 2019

Uh oh!

xavierleroy commented Jul 31, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants