Skip to content

[crashing] Improve crash chaining code#19973

Merged
alexischr merged 1 commit intomono:masterfrom
alexischr:crash-chaining-refactor
Jun 24, 2020
Merged

[crashing] Improve crash chaining code#19973
alexischr merged 1 commit intomono:masterfrom
alexischr:crash-chaining-refactor

Conversation

@alexischr
Copy link
Contributor

@alexischr alexischr commented Jun 15, 2020

With this change, mono_post_native_crash_handler chains the crashing signal directly, instead of that having to happen after mono_handle_native_crash returns (which created a fragile repetitive code pattern). Finally, this way mono_post_native_crash_handler can also re-raise the signal right after chaining it. Since we remove our crashing signal handlers in mono_handle_native_crash, this should allow the OS to handle the signal as the final action.

TODO:

  • Investigate the possibly broken existing pattern where mono_chain_signal is called before mono_handle_native_crash , and if the crash is chained, mono_handle_native_crash doesn't run (example: mini-runtime.c:mono_sigfpe_signal_handler)
  • Ensure that all of the crashing signal handlers are removed at the beginning of mono_handle_native_crash

Fixes #19860

@alexischr
Copy link
Contributor Author

@lambdageek @vargaz What do you think this code means to do (from master)?

mono/mono/mini/mini-runtime.c

Lines 3424 to 3434 in 74f6d2c

if (!mono_do_crash_chaining && mono_chain_signal (MONO_SIG_HANDLER_PARAMS))
goto exit;
mono_sigctx_to_monoctx (ctx, &mctx);
if (mono_dump_start ())
mono_handle_native_crash (mono_get_signame (SIGFPE), &mctx, info);
if (mono_do_crash_chaining) {
mono_chain_signal (MONO_SIG_HANDLER_PARAMS);
goto exit;
}
}

To me it means that if crash chaining is enabled (and actually has a handler registered), we will never mono_handle_native_crash - or reach the second call to mono_chain_signal - which does nothing if there is no handler registered.

I want to remove this part:

 	if (!mono_do_crash_chaining && mono_chain_signal (MONO_SIG_HANDLER_PARAMS)) 
 		goto exit; 

and always handle the chaining sequence the same way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to:

  1. Restore the default signal action
  2. Restore the original context (i.e. restore the registers, etc. to the values when the signal was raised), so that the process resumes execution with the default signal action (which will presumably terminate the process and cause the OS to produce a crash report).

The point is to leave the process in the same state it was when the signal was raised, except without Mono's signal handler. The reason for this is that when there's a crash in native code, it's quite often very difficult to figure out what happened, and having the original state can be helpful. In particular with crashes due to bugs in Apple's software, it would be much better to be able to provide Apple engineers with a crash report that accurately reflects the process state at the time of the crash instead of a process where Mono stomped all over the place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rolf (1) is done (now more completely) in mono_handle_native_crash

(2) I will have to think about how that would be possible - restoring the context would naturally divert execution to the new (old) IP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rolf On further thought, just restoring the context might work - without advancing the IP, the same instruction will execute as before, presumably inducing the same signal which then would be handled by the OS instead of Mono.

@alexischr alexischr changed the title [crashing] Refactor crash chaining code [crashing] Improve crash chaining code Jun 22, 2020
@alexischr alexischr force-pushed the crash-chaining-refactor branch from 703269e to 59c0d62 Compare June 22, 2020 17:29
@alexischr alexischr marked this pull request as ready for review June 22, 2020 17:38
@lambdageek
Copy link
Member

Windows builds are failing:

mini-windows.c:284:1: error: conflicting types for 'mono_post_native_crash_handler'
  284 | mono_post_native_crash_handler (const char *signal, MonoContext *mctx, MONO_SIG_HANDLER_INFO_TYPE *info, gboolean crash_chaining)
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from mini-windows.c:49:
mini-runtime.h:564:1: note: previous declaration of 'mono_post_native_crash_handler' was here
  564 | mono_post_native_crash_handler (const char *signal, MonoContext *mctx, MONO_SIG_HANDLER_INFO_TYPE *info, gboolean crash_chaining, void *context);
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

With this change, `mono_post_native_crash_handler` chains the crashing signal directly, instead of that having to happen after `mono_handle_native_crash` returns (which created a fragile repetitive code pattern). Finally, this way `mono_post_native_crash_handler` can also re-raise the signal right after chaining it. Since we remove our crashing signal handlers in `mono_handle_native_crash`, this should allow the OS to handle the signal as the final action.

Should fix mono#19860
@alexischr alexischr force-pushed the crash-chaining-refactor branch from 59c0d62 to 870f2b7 Compare June 23, 2020 20:23
@alexischr
Copy link
Contributor Author

@monojenkins backport to 2020-02

@monojenkins
Copy link
Contributor

@alexischr backporting to 2020-02 failed, the patch results in conflicts:

Applying: [crashing] Improve crash chaining
Using index info to reconstruct a base tree...
M	mono/mini/exceptions-amd64.c
M	mono/mini/exceptions-x86.c
M	mono/mini/mini-exceptions.c
M	mono/mini/mini-posix.c
M	mono/mini/mini-runtime.c
M	mono/mini/mini-runtime.h
M	mono/mini/mini-windows.c
M	mono/mini/mini.h
Falling back to patching base and 3-way merge...
Auto-merging mono/mini/mini.h
Auto-merging mono/mini/mini-windows.c
Auto-merging mono/mini/mini-runtime.h
Auto-merging mono/mini/mini-runtime.c
CONFLICT (content): Merge conflict in mono/mini/mini-runtime.c
Auto-merging mono/mini/mini-posix.c
Auto-merging mono/mini/mini-exceptions.c
Auto-merging mono/mini/exceptions-x86.c
Auto-merging mono/mini/exceptions-amd64.c
error: Failed to merge in the changes.
Patch failed at 0001 [crashing] Improve crash chaining

Please backport manually!

@alexischr alexischr merged commit 7a0425e into mono:master Jun 24, 2020
monojenkins pushed a commit to monojenkins/runtime that referenced this pull request Jun 26, 2020
…ash() for s390x

* Fix exception-s390x so mono_handle_native_crash() is called according to the changes made in mono/mono#19973
* Make roslyn the default compiler for s390x
vargaz pushed a commit to dotnet/runtime that referenced this pull request Jun 30, 2020
…ash() for s390x (#38428)

* Fix exception-s390x so mono_handle_native_crash() is called according to the changes made in mono/mono#19973
* Make roslyn the default compiler for s390x

Co-authored-by: nealef <nealef@users.noreply.github.com>
kevinwkt pushed a commit to kevinwkt/runtimelab that referenced this pull request Jul 15, 2020
…ash() for s390x (#38428)

* Fix exception-s390x so mono_handle_native_crash() is called according to the changes made in mono/mono#19973
* Make roslyn the default compiler for s390x

Co-authored-by: nealef <nealef@users.noreply.github.com>
naricc pushed a commit to naricc/mono that referenced this pull request Sep 1, 2020
This reverts commit 7a0425e.

This change was causing this android issue: mono#20275
naricc pushed a commit that referenced this pull request Sep 4, 2020
* Changed to i8

* Revert "[crashing] Improve crash chaining (#19973)"

This reverts commit 7a0425e.

This change was causing this android issue: #20275
monojenkins pushed a commit to monojenkins/mono that referenced this pull request Sep 8, 2020
This reverts commit 7a0425e.

This change was causing this android issue: mono#20275
marek-safar pushed a commit that referenced this pull request Sep 16, 2020
This reverts commit 7a0425e.

This change was causing this android issue: #20275
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mac/iOS: SIGSEGV will hang the process, not terminate it

4 participants