Fix stack overflow detection (MPR#7490) by mshinwell · Pull Request #1062 · ocaml/ocaml

mshinwell · 2017-02-24T11:06:31Z

Mantis 7490, using a small but yet highly involved example, demonstrates that the current detection of stack overflow is broken when the fault happens in a thread. This turns out to be caused by two problems:

It is necessary to assign an alternate stack to be used for the execution of signal handlers associated with a signal delivered to a specific thread, even though the setting of which handler is to be executed is process-wide. At least on Linux, signals arising from processor faults (e.g. SIGSEGV) will be delivered to the thread that caused them.
When the example in MPR#7490 is operating correctly, there are multiple invocations of the OCaml signal handler for SIGSEGV, until sufficient stack space has been recovered for the exception handler to run without causing another signal. At least on my system what eventually happens is that a segfault arises in a runtime function outputting a string. That signal shouldn't terminate the program because caml_c_call should have faulted before calling that C function (which in turn should have triggered another invocation of the signal handler and more unwinding). Unfortunately it wasn't faulting because the guard page size for the thread stack wasn't large enough. This patch now makes it match up with caml_c_call.

The segv_handler needs to know what size the current stack is; a new variable is introduced for this. Unfortunately determination of the size of a thread stack is troublesome. One approach is to initialise a pthread attributes block and then query it before it is passed to pthread_create, hoping that the creation function doesn't mess with the size. However the threading of the resulting size to the correct place in st_stubs.c is awkward and it won't work for threads registered with the OCaml runtime but not created by it. Instead we use platform-dependent methods to determine the thread stack size after creation; this has the nice side effect of giving us a more accurate top-of-stack value as well. At present, if C code registers a thread when the stack depth is deeper than it will be when some OCaml code running in that thread causes a segfault, then the stack overflow detection code may have a completely wrong top-of-stack value. (This problem also appears to exist for e.g. caml_startup but this patch doesn't address that. This could be fixed using similar code but there are some intricacies, e.g. needing to use __libc_stack_end rather than the pthreads function on glibc since it's the main thread.)

The means of determining stack base and size should work at least on Linux, FreeBSD, OpenBSD and macOS. I haven't yet tested on all these platforms but will do so. The code for doing this was inspired by some from the Chromium project [*] although it was written afresh.

This patch should also fix the stack size calculation in the presence of threads, subsuming GPR#961. It is currently based on 4.05 but may be more appropriate for trunk. Hopefully it will fix complaints about the stack overflow check seemingly not working correctly (there's been some such at Jane Street and I think this threads problem is probably the underlying reason).

As an aside, we are executing potentially non-async-signal-safe functions (cf. man 7 signal) in the signal handler, but it's unclear how to do better as noted in the patch.

[*] https://chromium.googlesource.com/experimental/chromium/blink/+/master/Source/platform/heap/StackFrameDepth.cpp

xavierleroy

Thanks for attacking this rather nasty problem. It's a lot of ifdefs but definitely an improvement on the earlier code.

What follows is a number of suggestions for style and for clarifications.

xavierleroy · 2017-02-25T16:17:48Z

asmrun/roots.c

 /* Communication with [caml_start_program] and [caml_call_gc]. */

 char * caml_top_of_stack;
+size_t caml_stack_size;


What about explicit initialization plus a comment?

size_t caml_stack_size = 0; /* determined dynamically for main thread */

xavierleroy · 2017-02-25T16:18:36Z

asmrun/signals_asm.c

  struct rlimit limit;
  struct sigaction act;
  char * fault_addr;
+  int handled = 0;


I'm no fan of Boolean variables that encode control flow. Do you really need this variable or would the if...else... do?

xavierleroy · 2017-02-25T16:21:41Z

asmrun/signals_asm.c

-      && fault_addr >= system_stack_top - limit.rlim_cur - EXTRA_STACK
+      && fault_addr < caml_top_of_stack) {
+    size_t stack_size;
+


Even though that's not what the original code did, I would prefer to have the code that computes the stack size first, followed by the "Sanity checks" tests, e.g.:

/* Compute the size of the stack for the thread that segfaulted */ if (caml_stack_size < 0) ... else if (caml_stack_size == 0) { ... } else { ... } /* Sanity checks: bla bla bla */ if (test1 && test2 && ... && testN) {

The sanity checks succeed most of the time, so there is no efficiency to be gained by conditionalizing the stack size computation. Plus, this code is not performance-critical.

xavierleroy · 2017-02-25T16:22:20Z

asmrun/signals_asm.c

+      if (getrlimit(RLIMIT_STACK, &limit) == 0) {
+        stack_size = limit.rlim_cur + EXTRA_STACK_RLIMIT;
+      }
+      else {


} <newline> else { is not idiomatic for this code base. Please write } else { like everywhere else.

xavierleroy · 2017-02-25T16:31:02Z

otherlibs/systhreads/st_posix.h

+
+  int local_var;
+
+  *top_of_stack = (void*) &local_var;


I was told that an optimizing compiler of the "bug the programmer" kind could delete this assignment or assign anything to *top_of_stack, because it is undefined behavior for the caller of this function to use the value *top_of_stack (it points to an object that no longer exists). For GCC and Clang (if GNUC is defined) you could use __builtin_frame_address(0).

Also, these default assignments to top_of_stack and stack_size could (should?) come last in the function, see comment below.

xavierleroy · 2017-02-25T16:34:34Z

otherlibs/systhreads/st_posix.h

+  *top_of_stack = (void*) &local_var;
+  *stack_size = -1;
+
+#if defined(HAS_PTHREAD_GET_STACKSIZE_NP) \


This #if nest is a pain to read because there is no consistent return style: sometimes return means failure and sometimes return means success. I suggest to stick to return mean success and fallthrough means keep trying:

if (PTHREAD_GETATTR_NP(pthread_self(), &attr) == 0 && pthread_attr_getstack(&attr, top_of_stack, stack_size) == 0) { *top_of_stack += *stack_size; return; /* Success! */ }

This works even if several #if feature tests succeed, and does the right thing: the first feature is tried, if it succeeds we return, otherwise we fall through, and the second feature is tried. Finally, the default can be at the end of the function.

xavierleroy · 2017-02-25T16:35:53Z

otherlibs/systhreads/st_stubs.c

 #ifdef NATIVE_CODE
  char * top_of_stack;          /* Top of stack for this thread (approx.) */
  char * bottom_of_stack;       /* Saved value of caml_bottom_of_stack */
+  size_t stack_size;            /* Stack size in bytes (0 if unknown) */


Comment may be misleading. Both 0 and -1 mean unknown to some extent.

xavierleroy · 2017-02-25T16:37:00Z

otherlibs/systhreads/st_stubs.c

 #ifdef NATIVE_CODE
-  /* Record top of stack (approximative) */
-  th->top_of_stack = &tos;
+  /* Establish stack to be used for running signal handlers (in particular


No need to set an alternate stack if HAS_STACK_OVERFLOW_DETECTION is not defined.

xavierleroy · 2017-02-25T16:39:01Z

otherlibs/systhreads/st_win32.h

+
+void st_determine_thread_stack_size(void** top_of_stack, size_t* stack_size)
+{
+#ifdef _WIN64


This totally cryptic code MUST come with a comment and a URL to a page that explains NT_TIB and all that stuff.

But: what's wrong with taking the address of a local variable, or using __builtin_frame_address(0) like in the Unix code?

stedolan · 2017-02-27T13:26:51Z

I'm probably missing something, but I don't understand why the simpler approach of comparing with the stack pointer doesn't work.

Inside the signal handler, you have access to CONTEXT_SP, which points to the bottom of the stack. You still need the platform-specific code to determine where the top of the stack is, but you can check whether the fault address is on the stack by checking whether it is between caml_top_of_stack and CONTEXT_SP, rather than trying to keep track of the stack size and the bottom of the stack.

In practice, stack faults happen close to the bottom of the stack, so you may be able to delete even caml_top_of_stack and just compare with CONTEXT_SP and CONTEXT_SP + a few k

mshinwell · 2017-02-28T08:34:47Z

@xavierleroy Thanks for your comments, which I shall address. In the meantime could you comment as to why the existing code didn't use the method that @stedolan suggests?

xavierleroy · 2017-02-28T11:10:46Z

As far as I can remember, @stedolan 's approach was never considered before. It could very well be a case of missing the obvious. So, maybe we should discuss the alternate approach first before investing more time in the code from this PR.

Concerning the alternate approach: be careful that the faulting address can very well be slightly below the SP register, e.g. a pushq or call on x86-64 atomically decreases SP by 8 then stores at SP, so when there is a fault I suspect SP is not modified hence the faulting address is SP-8. On PowerPC there is stwu and stdu that can have similar effects. Nothing damning here, just the need to have some slack around the value of SP.

Apart from this, the only situation that I can see where the SP-based determination would fail is if the value of SP, as recovered from the signal context, was completely off. I'm still trying to imagine how this could happen.

mshinwell · 2017-02-28T13:55:41Z

This library:
https://ftp.gnu.org/gnu/libsigsegv/libsigsegv-2.11.tar.gz
in handler-unix.c has an informative comment describing three methods, including the two we are discussing here. The code is hard to follow but I think it seems to prefer methods that check the faulting address against the stack bounds, rather than checking the faulting address against the stack pointer.

Two other random links I read:
https://opensource.apple.com/source/gm4/gm4-15/src/stackovf.c
https://cygwin.com/ml/cygwin/2009-07/msg00666.html

I was wondering if the fault address/stack pointer check might be unreliable if an architecture has a means of writing to a far-away stack location (causing a fault) before the stack pointer is updated. Do store instructions with writeback on ARM behave like that?

xavierleroy · 2017-02-28T17:56:33Z

I was wondering if the fault address/stack pointer check might be unreliable if an architecture has a means of writing to a far-away stack location (causing a fault) before the stack pointer is updated. Do store instructions with writeback on ARM behave like that?

I would suspect so. Likewise for the PowerPC "store-with-update" instructions that I mentioned and that C compilers often use to allocate stack frames. However, ocamlopt doesn't generate those PowerPC and ARM instructions as far as I remember. Also, "far away" is limited by how big an offset can be in an addressing mode: a few Kbytes for ARM, +/- 32 Kb for PowerPC. (Again: as far as I remember.) So: not a big concern.

xavierleroy · 2017-02-28T17:59:36Z

Speaking of libsigsegv: too bad it's GPL, because it's probably better and more portable than anything we can come up with ourselves.

stedolan · 2017-02-28T18:05:00Z

I would suspect so. Likewise for the PowerPC "store-with-update" instructions that I mentioned and that C compilers often use to allocate stack frames. However, ocamlopt doesn't generate those PowerPC and ARM instructions as far as I remember. Also, "far away" is limited by how big an offset can be in an addressing mode: a few Kbytes for ARM, +/- 32 Kb for PowerPC. (Again: as far as I remember.) So: not a big concern.

I think "far away" is closer than that. Compilers can't generate code that touches memory far below the stack pointer, since that memory can be overwritten by a signal handler between any two instructions. So, stack references always happen above sp - sizeof red zone, where the red zone is 128 bytes on amd64 and (random googling tells me) 224 bytes on powerpc.

xavierleroy · 2017-02-28T18:10:26Z

Yes, there is the red zone, and we should take it into account. But Mark and I were speculating about another phenomenon. On PowerPC 32 bits, for example, a popular way of allocating a stack frame of size N bytes is

     stwu  r1, -N(r1)

where r1 is the stack pointer. What this does is "store r1 at r1 - N and decrement r1 by N". (So, in one instruction the stack frame is allocated and linked to the previous frame.) ARM has similar store-then-update-address-register instructions. It is likely that if a stack overflow occurs, r1 is unchanged but the faulting write is at address r1-N, i.e. at distance N from r1. But again OCaml doesn't use this stwu trick, I think.

stedolan · 2017-02-28T18:37:56Z

Ah, I see what you mean.

I had a look at how Linux does this (the kernel needs to tell whether a fault is a stack reference or not, in order to decide whether to extend the stack to cover the faulting address or to SIGSEGV). Turns out it special-cases stwu r1, -N(r1):

http://lxr.free-electrons.com/source/arch/powerpc/mm/fault.c#L75
http://lxr.free-electrons.com/source/arch/powerpc/mm/fault.c#L359

shindere · 2017-03-03T11:09:12Z

Xavier Leroy (2017/02/28 09:59 -0800):

Speaking of libsigsegv: too bad it's GPL, because it's probably better and more portable than anything we can come up ourselves.

Would we be happy with LGPL?

dra27 · 2017-03-03T16:53:03Z

@shindere - I don't think it's enough. Without a full linking exception, it's quite difficult to facilitate the OCaml linking exception clause (and probably also the Consortium licence).

xavierleroy · 2017-03-03T18:22:35Z

Given that libsigsegv is a static library, LGPL-without-exception would be as much of a problem as GPL. If it were a dynamic library, it should be OK, just like it is OK for us to call into the GNU C library.

shindere · 2017-03-03T22:41:23Z

@xavierleroy and @dra27: so, what licencse would be okay for us? Is it something we could ask for with a reasonable hope of success? I'm willing to contact the authors of the library, in that case.

xavierleroy · 2019-03-16T17:46:38Z

This PR has been stalled for two years, which is sad because there are good ideas in it. No one wants to work out the approach outlined by @stedolan ?

mshinwell · 2019-03-18T08:44:21Z

This is actually on my stack, I will try to get back to it soon.

xavierleroy · 2019-10-20T07:49:51Z

Superseded by #8670

…ml#1062) Revert "Make [Invalid] take a structured form for invalid terms (ocaml#975)" This reverts commit 469e56e.

Co-authored-by: Cuihtlauac ALVARADO <cuihtmlauac@tarides.com>

Mark Shinwell added 2 commits February 24, 2017 10:38

First version of fix for PR7490

1209106

Missed revert

ed9769c

mshinwell requested a review from chambart February 24, 2017 11:06

mshinwell self-assigned this Feb 24, 2017

Changes

53c09d3

mshinwell mentioned this pull request Feb 24, 2017

Fix stack size calculation in the presence of threads #961

Closed

wintersteiger mentioned this pull request Feb 24, 2017

Fix for stack overflow problems due large output from Z3. FStarLang/FStar#870

Merged

xavierleroy reviewed Feb 25, 2017

View reviewed changes

damiendoligez added this to the 4.07-or-later milestone Sep 29, 2017

damiendoligez added the high-priority label Mar 1, 2018

damiendoligez added the bug label Jun 4, 2018

damiendoligez removed this from the consider-for-4.07 milestone Jun 4, 2018

This was referenced Mar 14, 2019

Segfault upon stack overflow in child thread. #7490

Closed

wrong stack_size when on another thread #7527

Closed

stedolan mentioned this pull request May 14, 2019

Fix stack overflow detection with systhreads #8670

Merged

xavierleroy closed this Oct 20, 2019

stedolan pushed a commit to stedolan/ocaml that referenced this pull request Mar 21, 2023

Revert "Make [Invalid] take a structured form for invalid terms" (oca…

4da84c4

…ml#1062) Revert "Make [Invalid] take a structured form for invalid terms (ocaml#975)" This reverts commit 469e56e.

EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024

Bringup peter-evans/create-pull-request to v5 (ocaml#1062)

ce1caba

Co-authored-by: Cuihtlauac ALVARADO <cuihtmlauac@tarides.com>

Conversation

mshinwell commented Feb 24, 2017

Uh oh!

xavierleroy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stedolan commented Feb 27, 2017

Uh oh!

mshinwell commented Feb 28, 2017

Uh oh!

xavierleroy commented Feb 28, 2017

Uh oh!

mshinwell commented Feb 28, 2017

Uh oh!

xavierleroy commented Feb 28, 2017

Uh oh!

xavierleroy commented Feb 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stedolan commented Feb 28, 2017

Uh oh!

xavierleroy commented Feb 28, 2017

Uh oh!

stedolan commented Feb 28, 2017

Uh oh!

shindere commented Mar 3, 2017 via email

Uh oh!

dra27 commented Mar 3, 2017

Uh oh!

xavierleroy commented Mar 3, 2017

Uh oh!

shindere commented Mar 3, 2017 via email

Uh oh!

xavierleroy commented Mar 16, 2019

Uh oh!

mshinwell commented Mar 18, 2019

Uh oh!

xavierleroy commented Oct 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

xavierleroy commented Feb 28, 2017 •

edited

Loading

xavierleroy commented Oct 20, 2019 •

edited

Loading