Ensure that GC is not invoked from bounds check failures#11303
Ensure that GC is not invoked from bounds check failures#11303xavierleroy merged 5 commits intoocaml:trunkfrom
Conversation
|
|
||
| include systhreads | ||
| * hassysthreads | ||
| ** not-windows |
There was a problem hiding this comment.
Because this test relies on triggering GC via signal handlers, and I didn't think that trick worked on Windows.
There was a problem hiding this comment.
If testing non-systhreads and/or windows is desired, it looks like this bug can be triggered using multiple domains (see #11425).
sadiqj
left a comment
There was a problem hiding this comment.
Thanks for identifying this and a test.
Only one question, in switching to caml_raise_exception I see we're no longer resetting Caml_state->local_roots. Is there's a scenario involving caml_callback where that corrupts the list?
Right, must be performed! Otherwise an out-of-bound array access without any try...with will crash the program instead of being reported. |
|
I took the liberty to push a more prudent version directly on this branch. It adds a local |
|
Thanks for pointing out the issue with this patch, but I think a slightly different fix would be simpler. The issue is that the requirements for raising an exception are very different when the exception is raised from C vs. from OCaml. In OCaml:
while when an exception is raised from C:
The problem is that My initial patch used the OCaml rules instead, by directly calling The current version of this patch (thanks to @xavierleroy) manages to, I think, implement the safe common subset of the OCaml and C requirements, by not polling, checking for a handler, and optionally cleaning up local_roots. However, I think a simpler approach would be to separate the two uses, leaving (I'm about to go on vacation for a week-and-a-bit. I'd hoped to write and push another version of this patch beforehand, but oh well) |
It's no longer called from a signal handler.
Even for bounds check failures, we still need to check for missing exception handler and to adjust the local roots chain.
Divide the caml_array_bound_error function into separate C and OCaml versions.
e02a3e3 to
044b591
Compare
|
I've just rebased and pushed a new version of the code. This version keeps the original behaviour of |
|
I've taken a look. I think the proposed solution handles the problem where a GC is triggered off the bounds exception in a straight forward way. However I'm not convinced we can ignore the frame table construction in the case where backtraces are stashed on some platforms. On ARM64 MacOS, the native compiler gives a repeating (until the buffer is filled) backtrace for the following test: (* TEST
flags = "-g"
ocamlrunparam += ",b=1"
*)
let xs = [| 0; 1; 2 |]
let bad_bound_fn x =
!x + xs.(100)
let _ =
try
ignore (Sys.opaque_identity (bad_bound_fn (ref 0)));
with exn ->
Printf.printf "Uncaught exception %s\n" (Printexc.to_string exn);
Printexc.print_backtrace stdout;
print_endline "OK"The issue is that the backtrace tries to walk the stack but on ARM64 (as it stands) we are not able to do this. x86 seems (at first glance) to not give garbage in this situation. This backtrace problem might not be for this PR to fix. This PR doesn't make the backtrace situation worse but does fix segfaults and hangs as things stand. |
I don't think this PR does this. In the current code base, ocamlopt -g generates a frame descriptor for each call to |
That sounds like perhaps an independent bug in the ARM64 backend? A frame table entry is created and it should have a correct stack frame length and debug info, which is all you need to construct the backtrace. (What's missing is that it does not have a valid set of live registers, so you can't do GC at this point if the handler is in the same function) |
You've convinced me that the problem is probably in the ARM64 backend with |
|
My understanding of the current status:
|
xavierleroy
left a comment
There was a problem hiding this comment.
Looks good to me. I agree with @dra27 that the test should also work under Windows, because it doesn't rely on OS signals, just on preemption between threads, right?
| ENTER_FUNCTION | ||
| LEA_VAR(caml_array_bound_error, %rax) |
There was a problem hiding this comment.
If anyone wonders: ENTER_FUNCTION maintains %rbp as frame pointer in "with frame pointers" mode, and does nothing otherwise. I guess it would have to be matched by a LEAVE_FUNCTION just before tail-calling caml_c_call. @stedolan chose to omit the ENTER_FUNCTION / LEAVE_FUNCTION dance entirely, which is correct too and probably just as good in practice.
(If I understand correctly, the only case where it might make a difference is in frame-pointer mode, if a profiling interrupt hits while executing the LEA_VAR instruction, in which case the %rbp-based backtrace will not be accurate. Nothing to worry about.)
There was a problem hiding this comment.
Actually, ENTER_FUNCTION seems required to get a correct backtrace in frame-pointer mode.
Co-authored-by: David Allsopp <david.allsopp@metastack.com>
* Divide the caml_array_bound_error function into separate C and OCaml versions. The OCaml version, called from the assembly stubs, does not check for pending asynchronous actions. * Added test testsuite/tests/lib-systhreads/boundscheck.ml * caml_raise no longer needs a CAMLno_asan annotation It's no longer called from a signal handler. Co-authored-by: Xavier Leroy <xavier.leroy@college-de-france.fr> (cherry picked from commit 9c60432)
|
Merged in trunk and cherry-picked to 5.0 |
|
FYI: this PR breaks frame-pointer support. The following tests produce no backtraces when the out-of-bound exception is raised. (CC: @fabbing) |
|
Re-adding the |
|
Yes, you do need this
|
Since 4.12, there's a subtle bug in how array/string/bigarrray bounds checks failures are handled, which can lead to corruption or segfaults in programs that catch the exceptions arising from bounds check failures.
Different invariants apply for exceptions from C versus OCaml. Exceptions raised from C are assumed to be at a safe point: all live values are registered on the stack,
caml_local_rootsis in a sensible state, and so on. In particular, the GC may run if asynchronous actions are pending. Exceptions raised from OCaml, by contrast, do not have these requirements. Instead, the stack is simply cut to the point of the exception handler (possibly after collecting a backtrace).In order to make it possible to collect a backtrace, raise sites of OCaml exceptions have a frame table entry, but this frame table entry only contains debuginfo and has an invalid set of live registers. (This shouldn't matter when raising exceptions from OCaml, as this data is not used).
However, the current implementation of bounds check failures does not raise an exception from OCaml: instead, it switches to C and calls
caml_raise. This function has stronger requirements, and in particular may invoke GC if actions are pending (see #9722). If this occurs, the GC can use the invalid frame table of the bounds check error. This can cause problems if there are in fact live values, particularly when an array bounds check is raised and caught in the same function. The attached testcase demonstrates this, going wrong on all versions of OCaml since 4.12.The fix here is to skip the C-specific processing when raising a bounds check failure in
caml_array_bound_error. (Other potential fixes are to update the assembly stub to avoid ever entering C in this case, or to generate correct frame table entries detailing the live registers for all bounds check points. This change seemed simplest)