Use signals for activation injection on macOS#46657
Conversation
This change moves macOS activation injection to the signal plan like it works on other Unix platforms. The reason is that the activation injection using thread suspension and thread redirection with helper frame can collide with signal handlers on the same thread and result in a corrupted stack frame. The issue can be reproduced by sending signals to the .NET process from some other process while the .NET process is doing a lot of GCs.
| #ifdef TARGET_ARM64 | ||
| // RtlRestoreContext assembly corrupts X16 & X17, so it cannot be | ||
| // used for Activation restore | ||
| MachSetThreadContext(context); |
There was a problem hiding this comment.
How is context restored for activation using signals? How are we preventing X16/X17 corruption?
Maybe we are just returning from the signal handler and the kernel is taking care of it
There was a problem hiding this comment.
I am actually wondering if we should change the JIT to not use X17 unless it is marked unsafe for GC.
There was a problem hiding this comment.
The context is restored by the kernel. We return from the signal handler and let the kernel do the job. We just copy CONTEXT to the ucontext_t passed to the signal handler before returning.
| * platforms and another type elsewhere. */ | ||
| #if HAVE_UCONTEXT_T | ||
| #include <ucontext.h> | ||
| #include <sys/ucontext.h> |
There was a problem hiding this comment.
@janvorli, was this part of the change required for activation via signal?
As far as I can tell, all systems that have sys/ucontext.h also have toolchain-specific ucontext.h which includes sys/ucontext.h, plus some additional defines. This is the case on macOS, Linux and illumos. However, on FreeBSD, it is sys/ucontext.h which includes machine/ucontext.h but it was working fine there before with <ucontext.h>.
I am asking as this is breaking illumos build since Friday. From logs:
2021-01-08T19:02:42.0116455Z [ 40%] Building CXX object pal/src/CMakeFiles/coreclrpal.dir/thread/context.cpp.o
2021-01-08T19:02:42.2653807Z In file included from /runtime/src/coreclr/pal/src/thread/context.cpp:25:
2021-01-08T19:02:42.2655771Z /runtime/src/coreclr/pal/src/thread/context.cpp: In function 'void CONTEXTToNativeContext(const CONTEXT*, native_context_t*)':
2021-01-08T19:02:42.2657287Z /runtime/src/coreclr/pal/src/include/pal/context.h:120:41: error: 'REG_RBP' was not declared in this scope
2021-01-08T19:02:42.2658120Z #define MCREG_Rbp(mc) ((mc).gregs[REG_RBP])
2021-01-08T19:02:42.2658595Z ^~~~~~~
this is because register definitions are included separately after the inclusion of sys/ucontext.h: https://github.com/illumos/illumos-gate/blob/260693/usr/src/head/ucontext.h#L35-L48.
There was a problem hiding this comment.
Opened #46790 with a potential fix. If activation does not particularly require sys/ucontext.h, I can simplify my patch to make it how it was before (#include <ucontext.h> without the #elif).
There was a problem hiding this comment.
Without this change, it is not compiling on macOS. I am getting:
In file included from /Users/janvorli/git/runtime2/src/coreclr/pal/src/exception/signal.cpp:51:
In file included from /Users/janvorli/git/runtime2/src/coreclr/pal/src/include/pal/context.h:34:
/Users/janvorli/Downloads/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.0.sdk/usr/include/ucontext.h:51:2: error: The
deprecated ucontext routines require _XOPEN_SOURCE to be defined
#error The deprecated ucontext routines require _XOPEN_SOURCE to be defined
^
1 error generated.
There was a problem hiding this comment.
On macos, the ucontext.h contains prototypes for makecontext, swapcontext etc that are deprecated. The sys/ucontext.h contains definition of the actual data structure.
Maybe it would be better to go back to including the <ucontext.h> and just define the _XOPEN_SOURCE symbol before including it.
There was a problem hiding this comment.
It seems to be compiling with SDK 10.4 locally and CI is green in the PR. Let me try with SDK 11.0.
This change moves macOS activation injection to the signal plan like it
works on other Unix platforms. The reason is that the activation injection
using thread suspension and thread redirection with helper frame can
collide with signal handlers on the same thread and result in a corrupted
stack frame. The issue can be reproduced by sending signals to the
.NET process from some other process while the .NET process is doing
a lot of GCs.