-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
I have spend some time during the last week tracing down an issue reported by @pavlexander (#16462) that he had when running his .NET app on Debian 9.3.
It manifested itself as a sigsegv when running a static constructor. And it was also intermittent, happening on some runs and not happening on others. And also not happening when running under a debugger. Until @pavlexander found that it is due to the fact that debuggers disable ASLR by default. After enabling it, it started to repro under the debugger too. But only on Debian 9.3, not e.g. on Ubuntu 14.04 or 16.04 etc.
This static constructor has enormous frame size, almost 128kB. So depending on where the ASLR set the initial RSP for the process, the used stack size either crossed the initial stack size of 128kB or not. And if it did, it has crashed during the stack probing. That was pretty strange since this was running on the primary thread and the maximum stack size was 8MB. So the kernel did not convert the fault to stack expansion on the affected system.
I’ve written a little testing app in C that basically does the same thing as the probing generated by the JIT. And on all my systems other than the Debian 9.3 (I even had Debian 9.0 and it was ok), it was working fine. On the Debian 9.3, it was crashing with sigsegv.
To make the story shorter, I’ve found that it is a kernel version dependent thing. By bisection and building and testing various kernel versions, I’ve found that it started to happen on kernel 4.9.34. Further bisection identified the linux kernel commit that changed that (torvalds/linux@cfc0eb4). It was a fix to a potential issue fixed by a rewrite of the guard page handling stuff. I spent some time reading and debugging the related portions of the kernel source and it turns out that basically, we were just lucky it has worked before this change. There is this comment close to the stack expansion invocation:
Accessing the stack below %sp is always a bug. The large cushion allows instructions like enter and pusha to work. ("enter $65535, $31" pushes 32 pointers and then decrements %sp by 65535.)
There is a check that tests if the fault address is farther than (65536 + 32 * 8) bytes from the stack pointer and if it is, it refuses to expand the stack. But before this check, there is another check testing if the fault address is inside of a range stored in a virtual memory descriptor for the stack. And that’s where the new and old kernels differ. The new kernel includes only the committed portion of the stack virtual memory range while the old one was including the guard page in that range too.
That means that on the old kernel, our probing has hit the guard page and since it was inside of the range described by the virtual memory descriptor for the stack, the test for the distance between SP and the failure address was skipped, the new page committed and the range expanded by another guard page.
On the new kernel, we hit the guard page, it is not part of the range for the stack anymore, so it checks the distance between the failure address and the RSP, finds that it is too large and so it passes sigsegv to our process.
I’ve tested even the latest Linux kernel 4.15 and this new behavior persists.
That means that in order to make stack probing work correctly on the new kernels too, we will need to modify the probing so that it moves the RSP as it probes (or at least once every 64kB). Btw, it looks like this check of the address to SP distance is there only for x86 / x64 and not for ARM / ARM64.