reintroduce native compiler for s390x#11712
Conversation
|
Thanks for your work to restore s390x support in OCaml 5, much appreciated. Re: calling conventions, on pretty much every target, ocamlopt uses its own calling conventions to call OCaml functions, because the standard ELF ABI calling conventions are designed for C and not well suited to a functional language. Typically, OCaml's custom conventions will use more registers for parameter passing and try to reduce stack usage to a mimimum, because we can have deeply nested recursive calls. For s390x, this means in particular that the 160 reserved bytes at the bottom of the stack are reserved only when calling a C function from OCaml. In OCaml 4, this is ensured by There are also a few places in runtime/s390x.S where C functions are called, e.g. In OCaml 5, the C stack and the OCaml stack differ, so the stack adjustments are a parameter to Iextcall and are to be performed after the swich to the C stack. Likewise, for the s390x.S functions, I'd expect the adjustments to take place after the stack switch. Let me konw if you have a more specific question about this stack business. |
Could you explain what's going on? Is it the OCaml stack that overflows (suprisingly, because it's much larger in OCaml 5 than in OCaml 4) or the C stack? Do you have a gdb backtrace? What is the workaround you mention? |
Crashing command is Workaround is here: Crash looks like this: The contents of It looks like there are a lot of nested calls, and stack is eventually exhausted and overwritten, by instructions before marked one: |
|
Your workaround changes |
|
Yes, it could be the case. |
a4e225b to
adfe45d
Compare
|
Implemented stack reallocation and a bit of other stuff. The only missing thing I'm aware of is CFI. I'll try to implement it as well. |
|
CFI is nice to have when debugging the OCaml runtime system itself, but we can live without it, so don't go out of your way to implement it. When you're happy with your code, could you please remove the "draft" status? Then we'll try to find reviewers. |
adfe45d to
7f7fd8a
Compare
|
I think everything is ready for review, although CFI could be further improved. |
|
Is there a good reference for the s390x memory model? |
It seems to be TSO with a twist, see https://check.cs.princeton.edu/papers/dlustig_thesis.pdf section 3.3.1. |
|
Is there a way to dump/save all assembly code generated when building ocaml compiler and libraries? It could help to further improve CFI. |
|
Just to understand, why do we want to add support for a seemingly long-discontinued machine? |
|
s390x is a historical name. The current name for this architecture is IBM Z and it's definitely not discontinued. |
You can try to set the OCAMLPARAM environment variable: This will leave .s files all around the place... |
|
Did you run the testsuite recently? I'm getting a bunch of failed tests in native code (segmentation faults). This is on a LinuxONE VM running RHEL 8.5. |
|
Yeah, I ran it, but I did it both on s390x and amd64. Both of them had similar amount of failures. Should all tests be fixed? Or should I take a look only at specific ones? |
|
To my knowledge, there should be no test failure on trunk, so "0 failures" would be the target. |
|
Rechecked it on current trunk, and got 0 failed on amd64 as well. I'll work on this pull request further to ensure tests are passing. |
|
With the tip of your branch I'm seeing 154 failed tests. One of them seems to loop, so you should run the testsuite with a short timeout: |
|
It seems that tests involving out-of-bounds array accesses are failing systematically. I haven't managed to get gdb running correctly yet so I don't know exactly what goes wrong. But you can trigger a bug with very small programs: let () = try [||].(0) with _ -> ()I get an |
runtime/s390x.S
Outdated
| stg %r14, 0(%r15) | ||
| CFI_OFFSET(14, -168) | ||
| ENTER_FUNCTION | ||
| LEA_VAR(caml_array_bound_error_asm, %r2) |
There was a problem hiding this comment.
| LEA_VAR(caml_array_bound_error_asm, %r2) | |
| LEA_VAR(caml_array_bound_error_asm, ADDITIONAL_ARG) |
This should fix a few errors with array bounds checks
|
The next error I'm seeing is a segfault while scanning the stack during the minor collection triggered by domain termination. I suspect the stack is supposed to be empty in that case, but it's not detected as such and the frame descriptor lookup returns a null pointer that is then dereferenced. |
|
I've created a branch with the fixes I've made here. |
|
I've updated my branch (still here).
I will try to look at the second issue when I have more time, but I'll need help with the first one. |
|
Thanks, I'm also still working on fixing tests. Will take a look at your branch. |
It breaks atomics by reordering operations.
Co-authored-by: Xavier Leroy <xavierleroy@users.noreply.github.com>
a580f4a to
e9332d6
Compare
|
CLA received in good order. check-typo honored. Time for merging ! |
|
CI shows that the "forbidden" and (to a lesser extent) "publish" tests take a lot of time on our 2-core test VM. Could this mean we need an s390x definition for |
|
.. and |
|
CI also fails on testsuite/tests/callback/callback_effects_gc.ml , but not in flambda mode, go figure... |
|
Beginning of an explanation: this test is run with a very small minor heap ( When running |
|
I think I found it: in |
Update the s390x native-code generator and the s390x.S assembly glue for OCaml 5. Co-authored-by: Vincent Laviron <vincent.laviron@gmail.com> (cherry picked from commit 3fefff5)
It's first version of reintroduction of native compiler for s390x.
There are still some issues remain, some stuff to improve, and some questions. Help is appreciated!
ocaml stack size is increased. This is a workaround for stack exhaustion crashes, like when running command "../../ocamlc.opt -nostdlib -I ../../stdlib -g -c -for-pack Dynlink_compilerlibs -strict-sequence -principal -absname -w +a-4-9-40-41-42-44-45-48 -warn-error +A -bin-annot -strict-formats -I byte -I dynlink_compilerlibs -o dynlink_compilerlibs/clflags.cmo dynlink_compilerlibs/clflags.ml" from "otherlibs/dynlink" directory, which is running when building native compiler.
CFI is not working correctly yet.
It looks like ocaml doesn't follow s390x ABI when generating s390x assembly code. It often doesn't allocate 160 bytes on stack. For example in function "camlStdlib.entry":
Between call to entry to function camlStdlib.entry on 0x00000000014cf890 and caml_call_gc on 0x00000000014d1558 only 16 bytes on stack (register %r15) are allocated while it should be 160 bytes at least according to s390x ABI. Is this deviation from ABI intended? Changes like these affect how CFI has to be updated.