Skip to content

Fix stack overflow detection with systhreads#8670

Merged
xavierleroy merged 2 commits intoocaml:trunkfrom
stedolan:stack-overflow-fix
Oct 11, 2019
Merged

Fix stack overflow detection with systhreads#8670
xavierleroy merged 2 commits intoocaml:trunkfrom
stedolan:stack-overflow-fix

Conversation

@stedolan
Copy link
Copy Markdown
Contributor

@stedolan stedolan commented May 14, 2019

Stack overflow detection doesn't work properly with systhreads, as shown by #7490. A fix was proposed a while ago in #1062, but stalled. This PR implements a simpler approach.

Instead of trying to work out the stack dimensions by querying OS interfaces, this patch defines a stack overflow as a SIGSEGV that happens on an address that's between the top of the stack and the current stack pointer (or slightly past the stack pointer, to handle some edge cases around function prologues).

There are a couple of changes:

  • calls to getrlimit are removed from the signal handler, inspecting the stack pointer instead.

  • all systhreads now reinitialise the sigaltstack (but not the signal handler) on startup. POSIX says you should have to do the former but not the latter, on Linux it seems you need both (this is nonsense, I was testing it badly. Reinitialising sigaltstack and leaving the handler alone suffices). All threads share the same sigaltstack - they won't collide unless there's a segfault in C code, in which case all bets are already off.

  • the stack probe limit on amd64 was lowered from 32k to 4k. Linux adds a 4k guard page at the end of thread stacks to catch overflow, but beyond that is free space. It is quite common (i.e. happened regularly in testing) for that space to be used by some other allocation. So, a stack probe of 32k from the current stack pointer has false positives: it crosses the guard page and touches some random bit of memory. A 4k probe cannot have false positives: it either lies in the stack or in the guard page.

  • caml_get_exception_raw_backtrace no longer allocates an 8K array on the stack. (This is an independent change, in a seperate commit. Including it here makes it safe to call Printexc.print_backtrace from a Stack_overflow handler).

This has so far only been tested on Linux amd64, but I've added CONTEXT_SP support for other platforms. Hopefully it will work everywhere?

@stedolan
Copy link
Copy Markdown
Contributor Author

cc @mshinwell, @xavierleroy

Copy link
Copy Markdown
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see this work restart. A couple of questions below.

@gasche
Copy link
Copy Markdown
Member

gasche commented May 15, 2019

846ce83 (the change to runtime/backtrace.c to use caml_alloc_shr instead of caml_alloc, and get rid of a large on-stack buffer) looks like a very nice change on its own -- depending on how fast you think the present PR is going to get a decision, it may make sense to send it as a separate pre-PR.

@let-def
Copy link
Copy Markdown
Contributor

let-def commented May 15, 2019

@gasche @stedolan: Doesn't this risk to put a lot of short-lived values in the major heap? I can see this happening when get_raw_backtrace is used in conjunction with reraise_with_backtrace.

@gasche
Copy link
Copy Markdown
Member

gasche commented May 15, 2019

When I first thought about the change, my idea was that callstacks are typically (1) fairly large (often larger than Max_young_wosize?) and (2) likely to get stored later for auditing, and thus long-lived, or to come just before the program errors out, and thus we don't care.

(I had another paragraph here which I guess was confused about when backtraces are generated by the compiler when using exceptions for control-flow, and people are not using raise_notrace. I need to remind myself of how that actually works or let someone else comment on that part -- it could be indeed that this use-case generates a lot of backtraces and discards most of them immediately, but maybe not.)

@stedolan
Copy link
Copy Markdown
Contributor Author

The situation with exceptions are used for control flow, where backtraces are unnecessarily generated and then discarded, is indeed annoyingly common. The importance of performance in this situation is why caml_stash_backtrace still has a static global buffer.

This change affects only Printexc.get_raw_backtrace (and a few other functions in Printexc), which use a seperate codepath from caml_stash_backtrace. I don't think this is used anywhere performance-sensitive.

@gasche
Copy link
Copy Markdown
Member

gasche commented May 15, 2019

Will the backtraces collected by statmemprof be affected? (Those are expected to follow the lifetime of the memory block they track, so we can expect that many of them will be short-lived.)

@stedolan
Copy link
Copy Markdown
Contributor Author

No, statmemprof does not use this mechanism to collect callstacks. Statmemprof is interested in the current callstack, so it walks the stack to collect it. These functions concern the backtrace of the most recent exception, which is stored in a global buffer.

@stedolan stedolan force-pushed the stack-overflow-fix branch from 74efb3b to e1d6305 Compare May 15, 2019 15:37
@stedolan
Copy link
Copy Markdown
Contributor Author

I changed this patch to remove the sharing of sigaltstacks. The reasoning why this was OK was based on considering only SIGSEGV; there are other signals and they share a stack.

@stedolan stedolan changed the title [WIP] Fix stack overflow detection with systhreads Fix stack overflow detection with systhreads May 15, 2019
@xavierleroy
Copy link
Copy Markdown
Contributor

This change affects only Printexc.get_raw_backtrace (and a few other functions in Printexc), which use a seperate codepath from caml_stash_backtrace.

This is true.

I don't think this is used anywhere performance-sensitive.

This is perhaps less true these days. For example, Fun.protect calls Printexc.get_raw_backtrace if an exception occurs. I would still hope that Fun.protect is not used in conjunction with functions that raise often, but who knows...

@gasche
Copy link
Copy Markdown
Member

gasche commented May 16, 2019

I looked for usage of Misc.try_finally (a parent of Fun.protect)
in the compiler codebase, I have the impression that all
use-cases concern blocks where the exception path is indeed
exceptional. It is also used at a fairly coarse
granularity (typically, the cleanup action is to remove
a temporary file), so I think additional costs would be
amortized.

I haven't seen a use of Fun.protect around exceptions used for
control-flow, but that seems entirely possible to me. (For example,
a SAT solver where the "restart" action, every N decision steps,
is implemented by raising an exception, may use Fun.protect to unwind
the decision stack.) I think it would make sense to also offer a
Fun.protect_notrace function, cousin of raise_notrace, that does
not capture the call stack and uses a reraise_notrace.

Let me know if you like the idea and I'll shoot a separate PR or issue report.

@stedolan
Copy link
Copy Markdown
Contributor Author

I think it would make sense to also offer a
Fun.protect_notrace function, cousin of raise_notrace, that does
not capture the call stack and uses a reraise_notrace.

Currently (both before and after this patch), if raise_notrace is used, then Printexc.get_raw_backtrace is very fast (it does not allocate at all, since there is no backtrace). So I don't think this function is necessary. Effectively, Fun.protect does a reraise_notrace if the original exception came from raise_notrace, which I think is the right behaviour.

@gasche
Copy link
Copy Markdown
Member

gasche commented May 21, 2019

Ah, indeed. Sorry for the internal confusedness.

@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented May 23, 2019

I looked for usage of Misc.try_finally (a parent of Fun.protect)
in the compiler codebase, I have the impression that all
use-cases concern blocks where the exception path is indeed
exceptional. It is also used at a fairly coarse
granularity (typically, the cleanup action is to remove
a temporary file), so I think additional costs would be
amortized.

In the following review of uses of try_finally in the compiler: #2118 (comment), I have found several cases where it is used for expressing temporary mutation, where the finally just rolls back some state. This is an idiomatic use of unwind-protect, and is a use-case that can occur in algorithmically-intensive situations (e.g. if exceptions are used for control flow). It seems incorrect to me to assume in general that resources are few and disposal is expensive (amortizes the cost of recording the stack trace).

I haven't seen a use of Fun.protect around exceptions used for
control-flow, but that seems entirely possible to me. (For example,
a SAT solver where the "restart" action, every N decision steps,
is implemented by raising an exception, may use Fun.protect to unwind
the decision stack.)

Yes, this is a valid use-case, actually I have found 5 of them in the aforementioned audit. Control flow is the intended usage of Fun.protect, when distinguished from bracket-style resource management that would do masking for async exceptions. This use-case was in fact the reason put forward in favour of including Fun.protect with its current type, whatever the fate of asynchronous exceptions would be.

Out of curiosity, how expensive is it to call Printexc.get_raw_backtrace immediately after Printexc.raise_with_backtrace, for instance in the case where the user has nested Fun.protects?

@stedolan stedolan force-pushed the stack-overflow-fix branch from 2c8f4f5 to 5b05557 Compare August 27, 2019 11:23
@stedolan
Copy link
Copy Markdown
Contributor Author

This discussion got side-tracked by some questions about whether allocating backtraces on the major heap would have a noticeable performance impact.

I've just rebased this patch to trunk. On trunk, since #8691, plain caml_alloc should not trigger callbacks. So, the new version of this patch just calls caml_alloc (which allocates on the minor or major heap as appropriate).

However, when rebasing this I noticed a bug in #8691, in that it does not always prevent caml_alloc from triggering callbacks. This bug is fixed in #8897, so we shouldn't merge this patch until after #8897.

@stedolan stedolan force-pushed the stack-overflow-fix branch 2 times, most recently from 30da9b1 to 104ce5e Compare August 27, 2019 13:56
caml_alloc is now used allocate backtraces, which is safe now that
caml_alloc can't call async callbacks.
@stedolan stedolan force-pushed the stack-overflow-fix branch from 104ce5e to f3c650b Compare August 27, 2019 13:58
@stedolan
Copy link
Copy Markdown
Contributor Author

#8897 is merged, and this branch has just passed precheck (300), so I think this is ready to merge.

@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Sep 24, 2019

The second commit about removing the stack-allocated buffer looks good to me. (I would happily help with a review of the first commit but the technical details still fly over my head.)

@stedolan
Copy link
Copy Markdown
Contributor Author

I'd quite like to get this into 4.10. Are there any remaining issues?

Copy link
Copy Markdown
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a second look at the code and at the discussion. I agree it's good to go.

@xavierleroy xavierleroy merged commit d29a01e into ocaml:trunk Oct 11, 2019
@stedolan
Copy link
Copy Markdown
Contributor Author

Thanks!

@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Oct 11, 2019

I hope it's the right place to report it: I have a bunch of errors like the following one when I do make depend inside the runtime directory:

gcc -MM -D_FILE_OFFSET_BITS=64 -D_REENTRANT -DCAML_NAME_SPACE   *.c | \
  sed -e 's/\([^.]*\)\.o/\1_b.$(O)/' > .depend
signals_nat.c:192:2: error: #error "CONTEXT_SP is required if HAS_STACK_OVERFLOW_DETECTION is defined"
 #error "CONTEXT_SP is required if HAS_STACK_OVERFLOW_DETECTION is defined"
  ^~~~~

jhjourdan added a commit to jhjourdan/ocaml that referenced this pull request Oct 18, 2019
Since the new stack overflow detection system (ocaml#8670), ASAN was confused.
We fix the issue by resetting the 'use_sigaltstack' flag and by disabling
ASAN on functions used by the stack overflow handler.
xavierleroy pushed a commit that referenced this pull request Oct 19, 2019
Since the new stack overflow detection system (#8670), ASAN was confused.
We fix the issue by telling ASAN to not use a sigaltstack, and by disabling
ASAN on functions used by the stack overflow handler.
xavierleroy pushed a commit that referenced this pull request Oct 19, 2019
Since the new stack overflow detection system (#8670), ASAN was confused.
We fix the issue by telling ASAN to not use a sigaltstack, and by disabling
ASAN on functions used by the stack overflow handler.

(cherry picked from commit 5a53560)
samoht added a commit to samoht/index that referenced this pull request Nov 20, 2019
To avoid allocating on the stack, keep the runtime lock so that OCaml objects
are not moved around.

This is needed because thread stack are very small on musl<1.1.21 (80k).

Allocating a buffer of 64k bytes on the stack means that the stack overflow
detection (`caml_c_call`) won't have enough space to allocate 32k and will
segfault: see ocaml/ocaml#7490 for a similar bug
report.

`caml_c_call` has been properly fixed in behttps://github.com/ocaml/ocaml/pull/8670
and it also has been changed to only allocate 4k -- also solving our initial
issue with too small thread stacks (as 64+8<80).

Many thanks to @sadiqj for helping with debugging the issue.

Co-authored-by: Frédéric Bour <fred@tarides.com>
gs0510 pushed a commit to gs0510/index that referenced this pull request Dec 6, 2019
To avoid allocating on the stack, keep the runtime lock so that OCaml objects
are not moved around.

This is needed because thread stack are very small on musl<1.1.21 (80k).

Allocating a buffer of 64k bytes on the stack means that the stack overflow
detection (`caml_c_call`) won't have enough space to allocate 32k and will
segfault: see ocaml/ocaml#7490 for a similar bug
report.

`caml_c_call` has been properly fixed in behttps://github.com/ocaml/ocaml/pull/8670
and it also has been changed to only allocate 4k -- also solving our initial
issue with too small thread stacks (as 64+8<80).

Many thanks to @sadiqj for helping with debugging the issue.

Co-authored-by: Frédéric Bour <fred@tarides.com>
gadmm added a commit to gadmm/ocaml that referenced this pull request Jan 28, 2020
gadmm added a commit to gadmm/ocaml that referenced this pull request Jan 28, 2020
Stack overflow detection is fixed in 4.10 at ocaml#8670.

No changes needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants