Skip to content

Use abort instead of exit(2) in caml_fatal_error.#8630

Merged
xavierleroy merged 4 commits intoocaml:trunkfrom
jhjourdan:fatal_error_abort
Jul 31, 2019
Merged

Use abort instead of exit(2) in caml_fatal_error.#8630
xavierleroy merged 4 commits intoocaml:trunkfrom
jhjourdan:fatal_error_abort

Conversation

@jhjourdan
Copy link
Copy Markdown
Contributor

This makes it possible to get a core dump in case of a fatal error, which makes debugging easier.

This has been suggested in #8628.

@dbuenzli
Copy link
Copy Markdown
Contributor

A git grep 'exit(' in runtime shows there are a few other places in the runtime system where it exit(2). I think at least one of these can safely be turned into caml_fatal_error: it's the first hit in win32.c. I'm unclear about those in fail_{byt,nat}.c.

The other ones and those calling caml_sys_exit are more tricky since they also involve Stdlib.at_exit handlers which you might not expect to run before an abort(). In particular changing the default behaviour in case of uncaught exception in printexc.c might not be a great idea and this one the user program can at least prevent it from happening and messing with its own exit usage.

@jhjourdan
Copy link
Copy Markdown
Contributor Author

Well spotted! I turned calls to exit(2) in fail_xxx.c and win32.c into fatal errors. I also fixed one typo and removed spurious calls to exit(-1) in the debugger.

@stedolan
Copy link
Copy Markdown
Contributor

I think this is a good idea: when the OCaml runtime fails, it is better to abort() than to exit with an error code.

However, caml_fatal_error is also used in the bytecode runtime for several more boring errors (e.g. if you run ocamlrun file_that_does_not_exist, you get a fatal error). I don't think these situations should trigger a SIGABRT.

I think the right thing to do here is to have caml_fatal_error call abort, but call exit(N) instead in these situations in runtime/startup_byt.c. I'm not sure what the exit code should be: ocaml currently exits with 2, but I think 127 is a more normal code for when an executable file cannot be found.

@jhjourdan
Copy link
Copy Markdown
Contributor Author

I think this is a good idea: when the OCaml runtime fails, it is better to abort() than to exit with an error code.

However, caml_fatal_error is also used in the bytecode runtime for several more boring errors (e.g. if you run ocamlrun file_that_does_not_exist, you get a fatal error). I don't think these situations should trigger a SIGABRT.

I think the right thing to do here is to have caml_fatal_error call abort, but call exit(N) instead in these situations in runtime/startup_byt.c. I'm not sure what the exit code should be: ocaml currently exits with 2, but I think 127 is a more normal code for when an executable file cannot be found.

Done.

Copy link
Copy Markdown
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with the general idea: an internal error in the runtime system should abort() rather than exit(2).

However, not all calls to caml_fatal_error represent internal errors in the runtime system. Some are errors in the OCaml program being executed, and should not abort().

One instance is when the bytecode executable cannot be found, and you already took care of that.

Another instance, detailed below, is when the C code tries to raise an OCaml exception and the OCaml code isn't initialized yet. This case was carefully reported like an uncaught exception, and you turned it into an abort(). This is debatable.

@jhjourdan jhjourdan force-pushed the fatal_error_abort branch from 83df152 to 77839c9 Compare May 28, 2019 12:46
@jhjourdan
Copy link
Copy Markdown
Contributor Author

Another instance, detailed below, is when the C code tries to raise an OCaml exception and the OCaml code isn't initialized yet. This case was carefully reported like an uncaught exception, and you turned it into an abort(). This is debatable.

As I explained, I think it should be considered UB to call OCaml runtime functions if said runtime is not initialized. Most of the external functions of the OCaml runtime requires it to be initialized anyway (allocation...). I think the test in check_global_data is only there because intern can raise an exception while decoding the bytecode executable.

If you don't agree, then I'll be happy to go back to the previous version (which is just a little less factorized). I just need to know what you think to move forward :)

@ygrek
Copy link
Copy Markdown
Contributor

ygrek commented May 28, 2019

I'm OK with the general idea: an internal error in the runtime system should abort() rather than exit(2).

ftr now I strongly support @dbuenzli idea of overridable C handler for this kind of unexpected situations. E.g. in our usecase we want to output log in some specific format so that our monitoring tools can track this situation same way as all other logs. Also from practice - we may not always want to do a coredump on oom (this contradicts what I said before, but that's what practice taught me - consuming all IO of a rood disk to get a 200G coredump is not always desirable).

@xavierleroy
Copy link
Copy Markdown
Contributor

The size of core dumps is not a factor here: just use ulimit -c N to limit the max size of core dumps, or ulimit -c 0 to make sure that core dumps are never generated.

This said, I am sympathetic to the idea that users could provide their own handler function instead of the default abort().

@jhjourdan
Copy link
Copy Markdown
Contributor Author

ftr now I strongly support @dbuenzli idea of overridable C handler for this kind of unexpected situations.

Isn't it possible to install a signal handler for SIGABRT?

@xavierleroy
Copy link
Copy Markdown
Contributor

As I explained, I think it should be considered UB to call OCaml runtime functions if said runtime is not initialized.

This is a reasonable claim that is open for discussion. I still think you need to know why the code you're modifying is written the way it is written, and be aware of what you are changing, rather than changing it without understanding it.

@ygrek
Copy link
Copy Markdown
Contributor

ygrek commented May 28, 2019

The size of core dumps is not a factor here: just use ulimit -c N to limit the max size of core dumps, or ulimit -c 0 to make sure that core dumps are never generated.

We still want to get (full) coredumps for real segfaults..

Isn't it possible to install a signal handler for SIGABRT?

maybe, but I think explicit handler is better mechanism

@dbuenzli
Copy link
Copy Markdown
Contributor

Isn't it possible to install a signal handler for SIGABRT?

In the handler the context is lost, is the abort coming from OCaml or from C ? If my main is in C I may simply choose to shutdown the OCaml system and restart it.

I think it's a good idea if the OCaml runtime strives to be treated as C library. In other words that it tries as much as possible not to fiddle with the non compositional aspects of the C runtime/standard library.

@jhjourdan
Copy link
Copy Markdown
Contributor Author

@xavierleroy : alright, I restored the initial code for the three changes you mentioned.

@dbuenzli , @ygrek : your wish is granted. caml_fatal_error now calls caml_fatal_error_hook if not NULL. Note, however, that this hook is not called in case of unhandled exception or failed startup.

Copy link
Copy Markdown
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff is getting shorter, I like it :-) More seriously: this looks very good to me.

@xavierleroy
Copy link
Copy Markdown
Contributor

Do I understand correctly that the hook mechanism answers @ygrek's concerns about core dumps? Or is there still a problem with standalone applications?

Copy link
Copy Markdown
Contributor

@ygrek ygrek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this covers all my concerns, yes

@jhjourdan jhjourdan force-pushed the fatal_error_abort branch from d86044b to 1a79cd2 Compare June 3, 2019 17:57
@jhjourdan
Copy link
Copy Markdown
Contributor Author

I rebased, squashed the commits, and mentionned the new hook in Chances. @xavierleroy : if this looks good to you , then this should be ready to merge.

@jhjourdan jhjourdan force-pushed the fatal_error_abort branch from 1a79cd2 to d1ea8d5 Compare June 6, 2019 14:31
jhjourdan added 2 commits June 8, 2019 23:04
This makes it possible to get a core dump in case of a fatal error,
which makes debugging easier.

This also
- turns a use of exit(2) to call to caml_fata_error
- replaces some uses of caml_fatal_error to a call to exit(127), when
  the error results in misuse of the runtime
- removes some useless calls to exit(-1)
When not NULL, this hook is called when a fatal error is encountered
instead of printing an error message on stderr.
@jhjourdan jhjourdan force-pushed the fatal_error_abort branch from d1ea8d5 to cddec18 Compare June 8, 2019 21:06
@jhjourdan
Copy link
Copy Markdown
Contributor Author

@xavierleroy: you approved these changes one 1.5 months ago. Is there anything blocking the merge?

@xavierleroy
Copy link
Copy Markdown
Contributor

I guess I was hoping someone else would bother to merge :-)

Let's merge, then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants