Skip to content

Fix stack overflow detection (MPR#7490)#1062

Closed
mshinwell wants to merge 3 commits intoocaml:4.05from
mshinwell:pr7490
Closed

Fix stack overflow detection (MPR#7490)#1062
mshinwell wants to merge 3 commits intoocaml:4.05from
mshinwell:pr7490

Conversation

@mshinwell
Copy link
Copy Markdown
Contributor

Mantis 7490, using a small but yet highly involved example, demonstrates that the current detection of stack overflow is broken when the fault happens in a thread. This turns out to be caused by two problems:

  1. It is necessary to assign an alternate stack to be used for the execution of signal handlers associated with a signal delivered to a specific thread, even though the setting of which handler is to be executed is process-wide. At least on Linux, signals arising from processor faults (e.g. SIGSEGV) will be delivered to the thread that caused them.
  2. When the example in MPR#7490 is operating correctly, there are multiple invocations of the OCaml signal handler for SIGSEGV, until sufficient stack space has been recovered for the exception handler to run without causing another signal. At least on my system what eventually happens is that a segfault arises in a runtime function outputting a string. That signal shouldn't terminate the program because caml_c_call should have faulted before calling that C function (which in turn should have triggered another invocation of the signal handler and more unwinding). Unfortunately it wasn't faulting because the guard page size for the thread stack wasn't large enough. This patch now makes it match up with caml_c_call.

The segv_handler needs to know what size the current stack is; a new variable is introduced for this. Unfortunately determination of the size of a thread stack is troublesome. One approach is to initialise a pthread attributes block and then query it before it is passed to pthread_create, hoping that the creation function doesn't mess with the size. However the threading of the resulting size to the correct place in st_stubs.c is awkward and it won't work for threads registered with the OCaml runtime but not created by it. Instead we use platform-dependent methods to determine the thread stack size after creation; this has the nice side effect of giving us a more accurate top-of-stack value as well. At present, if C code registers a thread when the stack depth is deeper than it will be when some OCaml code running in that thread causes a segfault, then the stack overflow detection code may have a completely wrong top-of-stack value. (This problem also appears to exist for e.g. caml_startup but this patch doesn't address that. This could be fixed using similar code but there are some intricacies, e.g. needing to use __libc_stack_end rather than the pthreads function on glibc since it's the main thread.)

The means of determining stack base and size should work at least on Linux, FreeBSD, OpenBSD and macOS. I haven't yet tested on all these platforms but will do so. The code for doing this was inspired by some from the Chromium project [*] although it was written afresh.

This patch should also fix the stack size calculation in the presence of threads, subsuming GPR#961. It is currently based on 4.05 but may be more appropriate for trunk. Hopefully it will fix complaints about the stack overflow check seemingly not working correctly (there's been some such at Jane Street and I think this threads problem is probably the underlying reason).

As an aside, we are executing potentially non-async-signal-safe functions (cf. man 7 signal) in the signal handler, but it's unclear how to do better as noted in the patch.

[*] https://chromium.googlesource.com/experimental/chromium/blink/+/master/Source/platform/heap/StackFrameDepth.cpp

@mshinwell mshinwell requested a review from chambart February 24, 2017 11:06
@mshinwell mshinwell self-assigned this Feb 24, 2017
Copy link
Copy Markdown
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for attacking this rather nasty problem. It's a lot of ifdefs but definitely an improvement on the earlier code.

What follows is a number of suggestions for style and for clarifications.

/* Communication with [caml_start_program] and [caml_call_gc]. */

char * caml_top_of_stack;
size_t caml_stack_size;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about explicit initialization plus a comment?

size_t caml_stack_size = 0;    /* determined dynamically for main thread */

struct rlimit limit;
struct sigaction act;
char * fault_addr;
int handled = 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm no fan of Boolean variables that encode control flow. Do you really need this variable or would the if...else... do?

&& fault_addr >= system_stack_top - limit.rlim_cur - EXTRA_STACK
&& fault_addr < caml_top_of_stack) {
size_t stack_size;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though that's not what the original code did, I would prefer to have the code that computes the stack size first, followed by the "Sanity checks" tests, e.g.:

   /* Compute the size of the stack for the thread that segfaulted */
  if (caml_stack_size < 0) ... else if (caml_stack_size == 0) { ... } else { ... }
  /* Sanity checks: bla bla bla */
  if (test1 && test2 && ... && testN) {

The sanity checks succeed most of the time, so there is no efficiency to be gained by conditionalizing the stack size computation. Plus, this code is not performance-critical.

if (getrlimit(RLIMIT_STACK, &limit) == 0) {
stack_size = limit.rlim_cur + EXTRA_STACK_RLIMIT;
}
else {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} <newline> else { is not idiomatic for this code base. Please write } else { like everywhere else.


int local_var;

*top_of_stack = (void*) &local_var;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was told that an optimizing compiler of the "bug the programmer" kind could delete this assignment or assign anything to *top_of_stack, because it is undefined behavior for the caller of this function to use the value *top_of_stack (it points to an object that no longer exists). For GCC and Clang (if GNUC is defined) you could use __builtin_frame_address(0).

Also, these default assignments to top_of_stack and stack_size could (should?) come last in the function, see comment below.

*top_of_stack = (void*) &local_var;
*stack_size = -1;

#if defined(HAS_PTHREAD_GET_STACKSIZE_NP) \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This #if nest is a pain to read because there is no consistent return style: sometimes return means failure and sometimes return means success. I suggest to stick to return mean success and fallthrough means keep trying:

  if (PTHREAD_GETATTR_NP(pthread_self(), &attr) == 0 &&
      pthread_attr_getstack(&attr, top_of_stack, stack_size) == 0) {
   *top_of_stack += *stack_size;
    return; /* Success! */
  }

This works even if several #if feature tests succeed, and does the right thing: the first feature is tried, if it succeeds we return, otherwise we fall through, and the second feature is tried. Finally, the default can be at the end of the function.

#ifdef NATIVE_CODE
char * top_of_stack; /* Top of stack for this thread (approx.) */
char * bottom_of_stack; /* Saved value of caml_bottom_of_stack */
size_t stack_size; /* Stack size in bytes (0 if unknown) */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment may be misleading. Both 0 and -1 mean unknown to some extent.

#ifdef NATIVE_CODE
/* Record top of stack (approximative) */
th->top_of_stack = &tos;
/* Establish stack to be used for running signal handlers (in particular
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to set an alternate stack if HAS_STACK_OVERFLOW_DETECTION is not defined.


void st_determine_thread_stack_size(void** top_of_stack, size_t* stack_size)
{
#ifdef _WIN64
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This totally cryptic code MUST come with a comment and a URL to a page that explains NT_TIB and all that stuff.

But: what's wrong with taking the address of a local variable, or using __builtin_frame_address(0) like in the Unix code?

@stedolan
Copy link
Copy Markdown
Contributor

I'm probably missing something, but I don't understand why the simpler approach of comparing with the stack pointer doesn't work.

Inside the signal handler, you have access to CONTEXT_SP, which points to the bottom of the stack. You still need the platform-specific code to determine where the top of the stack is, but you can check whether the fault address is on the stack by checking whether it is between caml_top_of_stack and CONTEXT_SP, rather than trying to keep track of the stack size and the bottom of the stack.

In practice, stack faults happen close to the bottom of the stack, so you may be able to delete even caml_top_of_stack and just compare with CONTEXT_SP and CONTEXT_SP + a few k

@mshinwell
Copy link
Copy Markdown
Contributor Author

@xavierleroy Thanks for your comments, which I shall address. In the meantime could you comment as to why the existing code didn't use the method that @stedolan suggests?

@xavierleroy
Copy link
Copy Markdown
Contributor

As far as I can remember, @stedolan 's approach was never considered before. It could very well be a case of missing the obvious. So, maybe we should discuss the alternate approach first before investing more time in the code from this PR.

Concerning the alternate approach: be careful that the faulting address can very well be slightly below the SP register, e.g. a pushq or call on x86-64 atomically decreases SP by 8 then stores at SP, so when there is a fault I suspect SP is not modified hence the faulting address is SP-8. On PowerPC there is stwu and stdu that can have similar effects. Nothing damning here, just the need to have some slack around the value of SP.

Apart from this, the only situation that I can see where the SP-based determination would fail is if the value of SP, as recovered from the signal context, was completely off. I'm still trying to imagine how this could happen.

@mshinwell
Copy link
Copy Markdown
Contributor Author

This library:
https://ftp.gnu.org/gnu/libsigsegv/libsigsegv-2.11.tar.gz
in handler-unix.c has an informative comment describing three methods, including the two we are discussing here. The code is hard to follow but I think it seems to prefer methods that check the faulting address against the stack bounds, rather than checking the faulting address against the stack pointer.

Two other random links I read:
https://opensource.apple.com/source/gm4/gm4-15/src/stackovf.c
https://cygwin.com/ml/cygwin/2009-07/msg00666.html

I was wondering if the fault address/stack pointer check might be unreliable if an architecture has a means of writing to a far-away stack location (causing a fault) before the stack pointer is updated. Do store instructions with writeback on ARM behave like that?

@xavierleroy
Copy link
Copy Markdown
Contributor

I was wondering if the fault address/stack pointer check might be unreliable if an architecture has a means of writing to a far-away stack location (causing a fault) before the stack pointer is updated. Do store instructions with writeback on ARM behave like that?

I would suspect so. Likewise for the PowerPC "store-with-update" instructions that I mentioned and that C compilers often use to allocate stack frames. However, ocamlopt doesn't generate those PowerPC and ARM instructions as far as I remember. Also, "far away" is limited by how big an offset can be in an addressing mode: a few Kbytes for ARM, +/- 32 Kb for PowerPC. (Again: as far as I remember.) So: not a big concern.

@xavierleroy
Copy link
Copy Markdown
Contributor

xavierleroy commented Feb 28, 2017

Speaking of libsigsegv: too bad it's GPL, because it's probably better and more portable than anything we can come up with ourselves.

@stedolan
Copy link
Copy Markdown
Contributor

I would suspect so. Likewise for the PowerPC "store-with-update" instructions that I mentioned and that C compilers often use to allocate stack frames. However, ocamlopt doesn't generate those PowerPC and ARM instructions as far as I remember. Also, "far away" is limited by how big an offset can be in an addressing mode: a few Kbytes for ARM, +/- 32 Kb for PowerPC. (Again: as far as I remember.) So: not a big concern.

I think "far away" is closer than that. Compilers can't generate code that touches memory far below the stack pointer, since that memory can be overwritten by a signal handler between any two instructions. So, stack references always happen above sp - sizeof red zone, where the red zone is 128 bytes on amd64 and (random googling tells me) 224 bytes on powerpc.

@xavierleroy
Copy link
Copy Markdown
Contributor

Yes, there is the red zone, and we should take it into account. But Mark and I were speculating about another phenomenon. On PowerPC 32 bits, for example, a popular way of allocating a stack frame of size N bytes is

     stwu  r1, -N(r1)

where r1 is the stack pointer. What this does is "store r1 at r1 - N and decrement r1 by N". (So, in one instruction the stack frame is allocated and linked to the previous frame.) ARM has similar store-then-update-address-register instructions. It is likely that if a stack overflow occurs, r1 is unchanged but the faulting write is at address r1-N, i.e. at distance N from r1. But again OCaml doesn't use this stwu trick, I think.

@stedolan
Copy link
Copy Markdown
Contributor

Ah, I see what you mean.

I had a look at how Linux does this (the kernel needs to tell whether a fault is a stack reference or not, in order to decide whether to extend the stack to cover the faulting address or to SIGSEGV). Turns out it special-cases stwu r1, -N(r1):

http://lxr.free-electrons.com/source/arch/powerpc/mm/fault.c#L75
http://lxr.free-electrons.com/source/arch/powerpc/mm/fault.c#L359

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Mar 3, 2017 via email

@dra27
Copy link
Copy Markdown
Member

dra27 commented Mar 3, 2017

@shindere - I don't think it's enough. Without a full linking exception, it's quite difficult to facilitate the OCaml linking exception clause (and probably also the Consortium licence).

@xavierleroy
Copy link
Copy Markdown
Contributor

Given that libsigsegv is a static library, LGPL-without-exception would be as much of a problem as GPL. If it were a dynamic library, it should be OK, just like it is OK for us to call into the GNU C library.

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Mar 3, 2017 via email

@damiendoligez damiendoligez added this to the 4.07-or-later milestone Sep 29, 2017
@damiendoligez damiendoligez removed this from the consider-for-4.07 milestone Jun 4, 2018
@xavierleroy
Copy link
Copy Markdown
Contributor

This PR has been stalled for two years, which is sad because there are good ideas in it. No one wants to work out the approach outlined by @stedolan ?

@mshinwell
Copy link
Copy Markdown
Contributor Author

This is actually on my stack, I will try to get back to it soon.

@xavierleroy
Copy link
Copy Markdown
Contributor

xavierleroy commented Oct 20, 2019

Superseded by #8670

stedolan pushed a commit to stedolan/ocaml that referenced this pull request Mar 21, 2023
…ml#1062)

Revert "Make [Invalid] take a structured form for invalid terms (ocaml#975)"

This reverts commit 469e56e.
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
Co-authored-by: Cuihtlauac ALVARADO <cuihtmlauac@tarides.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants