Fix stack overflow detection with systhreads#8670
Conversation
xavierleroy
left a comment
There was a problem hiding this comment.
Nice to see this work restart. A couple of questions below.
|
846ce83 (the change to |
|
When I first thought about the change, my idea was that callstacks are typically (1) fairly large (often larger than Max_young_wosize?) and (2) likely to get stored later for auditing, and thus long-lived, or to come just before the program errors out, and thus we don't care. (I had another paragraph here which I guess was confused about when backtraces are generated by the compiler when using exceptions for control-flow, and people are not using |
|
The situation with exceptions are used for control flow, where backtraces are unnecessarily generated and then discarded, is indeed annoyingly common. The importance of performance in this situation is why This change affects only Printexc.get_raw_backtrace (and a few other functions in |
|
Will the backtraces collected by statmemprof be affected? (Those are expected to follow the lifetime of the memory block they track, so we can expect that many of them will be short-lived.) |
|
No, statmemprof does not use this mechanism to collect callstacks. Statmemprof is interested in the current callstack, so it walks the stack to collect it. These functions concern the backtrace of the most recent exception, which is stored in a global buffer. |
74efb3b to
e1d6305
Compare
|
I changed this patch to remove the sharing of sigaltstacks. The reasoning why this was OK was based on considering only SIGSEGV; there are other signals and they share a stack. |
This is true.
This is perhaps less true these days. For example, |
|
I looked for usage of Misc.try_finally (a parent of Fun.protect) I haven't seen a use of Fun.protect around exceptions used for Let me know if you like the idea and I'll shoot a separate PR or issue report. |
Currently (both before and after this patch), if |
|
Ah, indeed. Sorry for the internal confusedness. |
In the following review of uses of
Yes, this is a valid use-case, actually I have found 5 of them in the aforementioned audit. Control flow is the intended usage of Out of curiosity, how expensive is it to call |
2c8f4f5 to
5b05557
Compare
|
This discussion got side-tracked by some questions about whether allocating backtraces on the major heap would have a noticeable performance impact. I've just rebased this patch to trunk. On trunk, since #8691, plain However, when rebasing this I noticed a bug in #8691, in that it does not always prevent |
30da9b1 to
104ce5e
Compare
caml_alloc is now used allocate backtraces, which is safe now that caml_alloc can't call async callbacks.
104ce5e to
f3c650b
Compare
|
#8897 is merged, and this branch has just passed precheck (300), so I think this is ready to merge. |
|
The second commit about removing the stack-allocated buffer looks good to me. (I would happily help with a review of the first commit but the technical details still fly over my head.) |
|
I'd quite like to get this into 4.10. Are there any remaining issues? |
xavierleroy
left a comment
There was a problem hiding this comment.
I had a second look at the code and at the discussion. I agree it's good to go.
|
Thanks! |
|
I hope it's the right place to report it: I have a bunch of errors like the following one when I do gcc -MM -D_FILE_OFFSET_BITS=64 -D_REENTRANT -DCAML_NAME_SPACE *.c | \
sed -e 's/\([^.]*\)\.o/\1_b.$(O)/' > .depend
signals_nat.c:192:2: error: #error "CONTEXT_SP is required if HAS_STACK_OVERFLOW_DETECTION is defined"
#error "CONTEXT_SP is required if HAS_STACK_OVERFLOW_DETECTION is defined"
^~~~~ |
Since the new stack overflow detection system (ocaml#8670), ASAN was confused. We fix the issue by resetting the 'use_sigaltstack' flag and by disabling ASAN on functions used by the stack overflow handler.
Since the new stack overflow detection system (#8670), ASAN was confused. We fix the issue by telling ASAN to not use a sigaltstack, and by disabling ASAN on functions used by the stack overflow handler.
To avoid allocating on the stack, keep the runtime lock so that OCaml objects are not moved around. This is needed because thread stack are very small on musl<1.1.21 (80k). Allocating a buffer of 64k bytes on the stack means that the stack overflow detection (`caml_c_call`) won't have enough space to allocate 32k and will segfault: see ocaml/ocaml#7490 for a similar bug report. `caml_c_call` has been properly fixed in behttps://github.com/ocaml/ocaml/pull/8670 and it also has been changed to only allocate 4k -- also solving our initial issue with too small thread stacks (as 64+8<80). Many thanks to @sadiqj for helping with debugging the issue. Co-authored-by: Frédéric Bour <fred@tarides.com>
To avoid allocating on the stack, keep the runtime lock so that OCaml objects are not moved around. This is needed because thread stack are very small on musl<1.1.21 (80k). Allocating a buffer of 64k bytes on the stack means that the stack overflow detection (`caml_c_call`) won't have enough space to allocate 32k and will segfault: see ocaml/ocaml#7490 for a similar bug report. `caml_c_call` has been properly fixed in behttps://github.com/ocaml/ocaml/pull/8670 and it also has been changed to only allocate 4k -- also solving our initial issue with too small thread stacks (as 64+8<80). Many thanks to @sadiqj for helping with debugging the issue. Co-authored-by: Frédéric Bour <fred@tarides.com>
Stack overflow detection is fixed in 4.10 at ocaml#8670.
Stack overflow detection is fixed in 4.10 at ocaml#8670. No changes needed.
Stack overflow detection doesn't work properly with systhreads, as shown by #7490. A fix was proposed a while ago in #1062, but stalled. This PR implements a simpler approach.
Instead of trying to work out the stack dimensions by querying OS interfaces, this patch defines a stack overflow as a SIGSEGV that happens on an address that's between the top of the stack and the current stack pointer (or slightly past the stack pointer, to handle some edge cases around function prologues).
There are a couple of changes:
calls to getrlimit are removed from the signal handler, inspecting the stack pointer instead.
all systhreads now reinitialise the sigaltstack (but not the signal handler) on startup.
POSIX says you should have to do the former but not the latter, on Linux it seems you need both(this is nonsense, I was testing it badly. Reinitialising sigaltstack and leaving the handler alone suffices). All threads share the same sigaltstack - they won't collide unless there's a segfault in C code, in which case all bets are already off.the stack probe limit on amd64 was lowered from 32k to 4k. Linux adds a 4k guard page at the end of thread stacks to catch overflow, but beyond that is free space. It is quite common (i.e. happened regularly in testing) for that space to be used by some other allocation. So, a stack probe of 32k from the current stack pointer has false positives: it crosses the guard page and touches some random bit of memory. A 4k probe cannot have false positives: it either lies in the stack or in the guard page.
caml_get_exception_raw_backtraceno longer allocates an 8K array on the stack. (This is an independent change, in a seperate commit. Including it here makes it safe to callPrintexc.print_backtracefrom aStack_overflowhandler).This has so far only been tested on Linux amd64, but I've added CONTEXT_SP support for other platforms. Hopefully it will work everywhere?