Fix stack overflow detection (MPR#7490)#1062
Conversation
xavierleroy
left a comment
There was a problem hiding this comment.
Thanks for attacking this rather nasty problem. It's a lot of ifdefs but definitely an improvement on the earlier code.
What follows is a number of suggestions for style and for clarifications.
| /* Communication with [caml_start_program] and [caml_call_gc]. */ | ||
|
|
||
| char * caml_top_of_stack; | ||
| size_t caml_stack_size; |
There was a problem hiding this comment.
What about explicit initialization plus a comment?
size_t caml_stack_size = 0; /* determined dynamically for main thread */
| struct rlimit limit; | ||
| struct sigaction act; | ||
| char * fault_addr; | ||
| int handled = 0; |
There was a problem hiding this comment.
I'm no fan of Boolean variables that encode control flow. Do you really need this variable or would the if...else... do?
| && fault_addr >= system_stack_top - limit.rlim_cur - EXTRA_STACK | ||
| && fault_addr < caml_top_of_stack) { | ||
| size_t stack_size; | ||
|
|
There was a problem hiding this comment.
Even though that's not what the original code did, I would prefer to have the code that computes the stack size first, followed by the "Sanity checks" tests, e.g.:
/* Compute the size of the stack for the thread that segfaulted */
if (caml_stack_size < 0) ... else if (caml_stack_size == 0) { ... } else { ... }
/* Sanity checks: bla bla bla */
if (test1 && test2 && ... && testN) {
The sanity checks succeed most of the time, so there is no efficiency to be gained by conditionalizing the stack size computation. Plus, this code is not performance-critical.
| if (getrlimit(RLIMIT_STACK, &limit) == 0) { | ||
| stack_size = limit.rlim_cur + EXTRA_STACK_RLIMIT; | ||
| } | ||
| else { |
There was a problem hiding this comment.
} <newline> else { is not idiomatic for this code base. Please write } else { like everywhere else.
|
|
||
| int local_var; | ||
|
|
||
| *top_of_stack = (void*) &local_var; |
There was a problem hiding this comment.
I was told that an optimizing compiler of the "bug the programmer" kind could delete this assignment or assign anything to *top_of_stack, because it is undefined behavior for the caller of this function to use the value *top_of_stack (it points to an object that no longer exists). For GCC and Clang (if GNUC is defined) you could use __builtin_frame_address(0).
Also, these default assignments to top_of_stack and stack_size could (should?) come last in the function, see comment below.
| *top_of_stack = (void*) &local_var; | ||
| *stack_size = -1; | ||
|
|
||
| #if defined(HAS_PTHREAD_GET_STACKSIZE_NP) \ |
There was a problem hiding this comment.
This #if nest is a pain to read because there is no consistent return style: sometimes return means failure and sometimes return means success. I suggest to stick to return mean success and fallthrough means keep trying:
if (PTHREAD_GETATTR_NP(pthread_self(), &attr) == 0 &&
pthread_attr_getstack(&attr, top_of_stack, stack_size) == 0) {
*top_of_stack += *stack_size;
return; /* Success! */
}
This works even if several #if feature tests succeed, and does the right thing: the first feature is tried, if it succeeds we return, otherwise we fall through, and the second feature is tried. Finally, the default can be at the end of the function.
| #ifdef NATIVE_CODE | ||
| char * top_of_stack; /* Top of stack for this thread (approx.) */ | ||
| char * bottom_of_stack; /* Saved value of caml_bottom_of_stack */ | ||
| size_t stack_size; /* Stack size in bytes (0 if unknown) */ |
There was a problem hiding this comment.
Comment may be misleading. Both 0 and -1 mean unknown to some extent.
| #ifdef NATIVE_CODE | ||
| /* Record top of stack (approximative) */ | ||
| th->top_of_stack = &tos; | ||
| /* Establish stack to be used for running signal handlers (in particular |
There was a problem hiding this comment.
No need to set an alternate stack if HAS_STACK_OVERFLOW_DETECTION is not defined.
|
|
||
| void st_determine_thread_stack_size(void** top_of_stack, size_t* stack_size) | ||
| { | ||
| #ifdef _WIN64 |
There was a problem hiding this comment.
This totally cryptic code MUST come with a comment and a URL to a page that explains NT_TIB and all that stuff.
But: what's wrong with taking the address of a local variable, or using __builtin_frame_address(0) like in the Unix code?
|
I'm probably missing something, but I don't understand why the simpler approach of comparing with the stack pointer doesn't work. Inside the signal handler, you have access to In practice, stack faults happen close to the bottom of the stack, so you may be able to delete even |
|
@xavierleroy Thanks for your comments, which I shall address. In the meantime could you comment as to why the existing code didn't use the method that @stedolan suggests? |
|
As far as I can remember, @stedolan 's approach was never considered before. It could very well be a case of missing the obvious. So, maybe we should discuss the alternate approach first before investing more time in the code from this PR. Concerning the alternate approach: be careful that the faulting address can very well be slightly below the SP register, e.g. a Apart from this, the only situation that I can see where the SP-based determination would fail is if the value of SP, as recovered from the signal context, was completely off. I'm still trying to imagine how this could happen. |
|
This library: Two other random links I read: I was wondering if the fault address/stack pointer check might be unreliable if an architecture has a means of writing to a far-away stack location (causing a fault) before the stack pointer is updated. Do store instructions with writeback on ARM behave like that? |
I would suspect so. Likewise for the PowerPC "store-with-update" instructions that I mentioned and that C compilers often use to allocate stack frames. However, ocamlopt doesn't generate those PowerPC and ARM instructions as far as I remember. Also, "far away" is limited by how big an offset can be in an addressing mode: a few Kbytes for ARM, +/- 32 Kb for PowerPC. (Again: as far as I remember.) So: not a big concern. |
|
Speaking of libsigsegv: too bad it's GPL, because it's probably better and more portable than anything we can come up with ourselves. |
I think "far away" is closer than that. Compilers can't generate code that touches memory far below the stack pointer, since that memory can be overwritten by a signal handler between any two instructions. So, stack references always happen above |
|
Yes, there is the red zone, and we should take it into account. But Mark and I were speculating about another phenomenon. On PowerPC 32 bits, for example, a popular way of allocating a stack frame of size N bytes is where r1 is the stack pointer. What this does is "store r1 at r1 - N and decrement r1 by N". (So, in one instruction the stack frame is allocated and linked to the previous frame.) ARM has similar store-then-update-address-register instructions. It is likely that if a stack overflow occurs, r1 is unchanged but the faulting write is at address r1-N, i.e. at distance N from r1. But again OCaml doesn't use this |
|
Ah, I see what you mean. I had a look at how Linux does this (the kernel needs to tell whether a fault is a stack reference or not, in order to decide whether to extend the stack to cover the faulting address or to http://lxr.free-electrons.com/source/arch/powerpc/mm/fault.c#L75 |
|
Xavier Leroy (2017/02/28 09:59 -0800):
Speaking of libsigsegv: too bad it's GPL, because it's probably better
and more portable than anything we can come up ourselves.
Would we be happy with LGPL?
|
|
@shindere - I don't think it's enough. Without a full linking exception, it's quite difficult to facilitate the OCaml linking exception clause (and probably also the Consortium licence). |
|
Given that libsigsegv is a static library, LGPL-without-exception would be as much of a problem as GPL. If it were a dynamic library, it should be OK, just like it is OK for us to call into the GNU C library. |
|
@xavierleroy and @dra27: so, what licencse would be okay for us?
Is it something we could ask for with a reasonable hope of success? I'm
willing to contact the authors of the library, in that case.
|
|
This PR has been stalled for two years, which is sad because there are good ideas in it. No one wants to work out the approach outlined by @stedolan ? |
|
This is actually on my stack, I will try to get back to it soon. |
|
Superseded by #8670 |
Co-authored-by: Cuihtlauac ALVARADO <cuihtmlauac@tarides.com>
Mantis 7490, using a small but yet highly involved example, demonstrates that the current detection of stack overflow is broken when the fault happens in a thread. This turns out to be caused by two problems:
SIGSEGV) will be delivered to the thread that caused them.caml_c_callshould have faulted before calling that C function (which in turn should have triggered another invocation of the signal handler and more unwinding). Unfortunately it wasn't faulting because the guard page size for the thread stack wasn't large enough. This patch now makes it match up withcaml_c_call.The
segv_handlerneeds to know what size the current stack is; a new variable is introduced for this. Unfortunately determination of the size of a thread stack is troublesome. One approach is to initialise a pthread attributes block and then query it before it is passed topthread_create, hoping that the creation function doesn't mess with the size. However the threading of the resulting size to the correct place inst_stubs.cis awkward and it won't work for threads registered with the OCaml runtime but not created by it. Instead we use platform-dependent methods to determine the thread stack size after creation; this has the nice side effect of giving us a more accurate top-of-stack value as well. At present, if C code registers a thread when the stack depth is deeper than it will be when some OCaml code running in that thread causes a segfault, then the stack overflow detection code may have a completely wrong top-of-stack value. (This problem also appears to exist for e.g.caml_startupbut this patch doesn't address that. This could be fixed using similar code but there are some intricacies, e.g. needing to use__libc_stack_endrather than the pthreads function on glibc since it's the main thread.)The means of determining stack base and size should work at least on Linux, FreeBSD, OpenBSD and macOS. I haven't yet tested on all these platforms but will do so. The code for doing this was inspired by some from the Chromium project [*] although it was written afresh.
This patch should also fix the stack size calculation in the presence of threads, subsuming GPR#961. It is currently based on 4.05 but may be more appropriate for trunk. Hopefully it will fix complaints about the stack overflow check seemingly not working correctly (there's been some such at Jane Street and I think this threads problem is probably the underlying reason).
As an aside, we are executing potentially non-async-signal-safe functions (cf.
man 7 signal) in the signal handler, but it's unclear how to do better as noted in the patch.[*] https://chromium.googlesource.com/experimental/chromium/blink/+/master/Source/platform/heap/StackFrameDepth.cpp