Skip to content

s390x: use GOT-indirect calls to fix heap corruption with shared runt…#14547

Merged
nojb merged 4 commits intoocaml:trunkfrom
Zaneham:fix-s390x-plt-heap-corruption
Mar 1, 2026
Merged

s390x: use GOT-indirect calls to fix heap corruption with shared runt…#14547
nojb merged 4 commits intoocaml:trunkfrom
Zaneham:fix-s390x-plt-heap-corruption

Conversation

@Zaneham
Copy link
Copy Markdown
Contributor

@Zaneham Zaneham commented Feb 12, 2026

This was a fun one. Classic storage overlay on s390x, on z/OS you'd get a nice S0C4 abend with a dump to work from,
but on Linux all you get is signal 11 and a long night with GDB.

The PLT lazy binding trampoline (_dl_runtime_resolve_vx) saves float registers relative to r15, assuming it's a proper
C stack frame. OCaml's r15 points to its own fiber stack, which lives on the heap. So, when runtime functions like
caml_alloc1 get called through the PLT with libasmrun_shared.so, the trampoline happily saves +infinity and epsilon
right over a skiplist cell. caml_skiplist_insert then tries to follow a forward pointer that's actually IEEE 754
infinity, and down we go.

Linking with -Wl,-z,now also fixes this by forcing eager symbol resolution at load time, but that just sidesteps the
problem. The codegen is still emitting calls that violate the s390x ABI whenever r15 is on the OCaml stack, and
anything else that assumes a valid C frame at r15 would hit the same class of bug.

The underlying ABI violation exists on all distros. It's confirmed on RHEL 8.9 and Fedora 41 but not observed on
Ubuntu 24.04, maybe due to differences in glibc's PLT resolver behaviour or heap layout. If anyone knows more, I would like your input. The bug is latent regardless.

Fix is straightforward: swap @plt calls for GOT-indirect calls (lgrl/basr instead of brasl). The GOT entries resolve at load time, so the trampoline never fires. This is the same pattern already used for data symbols in
emit_load_symbol_addr.

Tested on s390x RHEL 8.9:

  • Before: segfault (exit 139)
  • After: Hello, world of s390x!
  • LD_BIND_NOW=1 independently confirms the trampoline as the culprit

Fixes #13693
Related: #14514 - my previous PR attempt which was a "flop"

@xavierleroy
Copy link
Copy Markdown
Contributor

Nice detective work, thank you!

The codegen is still emitting calls that violate the s390x ABI whenever r15 is on the OCaml stack

That's correct, but it's intentional: the standard ABI is too inefficient for a functional language. (We're not wasting 160 bytes for every stack frame.)

, and anything else that assumes a valid C frame at r15 would hit the same class of bug.

We're careful to switch to the C stack before calling C functions (see the Iextcall instruction and the glue functions in runtime/s390x.S). But we can't do that for the dynamic resolver on @PLT functions...

Fix is straightforward: swap @plt calls for GOT-indirect calls (lgrl/basr instead of brasl). The GOT entries resolve at load time, so the trampoline never fires.

That's a good fix. The only downside is an increase in code size, but it's not too bad: +3% for the OCaml standard library. I didn't try to measure a possible increase in running time.

A fix that would avoid increasing code size would be to force "bind now" resolution, as per -Wl,-z now or LD_BIND_NOW=1, but I don't know how to do that from within the ocamlopt-generated files (as opposed to asking the end user to link specially).

Yet another option would be to maintain a "red zone" of 160 free bytes at the bottom of the OCaml stacks. There would be a single 160-byte area, not one per stack frame. That would give the dynamic resolver the space it expects. However, unlike the C stack, the OCaml stacks don't grow automatically, so it's still dangerous to run code produced by a C compiler (incl. the dynamic resolver) on the OCaml stacks.

So, I'm OK with merging this PR, but I'll wait a bit for others to comment.

@dra27
Copy link
Copy Markdown
Member

dra27 commented Feb 20, 2026

Thanks for working on this! I've pushed a commit updating the test-in-prefix tests - segfaulting is no longer "permitted" on s390x (although the workaround code is still needed for riscv, which has a similar problem).

The failure can be seen on Jenkins for any of the recent trunk runs by searching the logs for nat_obj_shared where <exit 139> can then be seen on all the runs. With this PR in precheck#1137, there is no longer <exit 139>

@dra27
Copy link
Copy Markdown
Member

dra27 commented Feb 20, 2026

Would another option here be to effectively move all the OCaml-to-C parts of s390.S to the startup object of any program i.e. leave caml_start_program and the associated caml_callback_* functions where they are, but put the allocation, GC, fiber and tsan functions into a separate file which gets emitted with all the generated functions? They would then always be local to any program, and libasmrun_shared.so only exposes functions which strictly follow the ABI.

@xavierleroy
Copy link
Copy Markdown
Contributor

I thought about treating calls to runtime glue functions (the ones defined in s390x.S) specially, but then I realized that ocamlopt also generates @PLT calls for calls to OCaml functions. (At least in PIC mode, which is the default.) So, I thought these other @PLT calls could also create problems and should also be avoided.

Thinking some more about it, I believe @PLT references for calls to OCaml functions are always resolved eagerly, either statically if they occur in the main program or at dlopen time if they occur in a shared OCaml library.

So, the only @PLT references that are resolved on first call are (1) to runtime glue functions, and (2) to "noalloc" C functions. (2) is safe because the calls are taken on the C stack. (1) can be fixed as proposed in this PR, with a load address followed by an indirect call.

Emitting load address + indirect call for all OCaml function calls doesn't hurt code size much, from my measurements, but I don't know if there's an impact on execution time. Naively, all these indirect calls can be predicted perfectly, but it could use a lot of branch prediction table entries.

@Zaneham
Copy link
Copy Markdown
Contributor Author

Zaneham commented Feb 23, 2026

Hello @xavierleroy, apologies on the late reply I ended up being busy with a few things. Thanks for the information on the intentional ABI deviation. I'll actually go and see what narrowing it to just runtime glue calls will do and measure the branch prediction and performance. I'll report back when I have a spot of time.

@dra27 Thanks for testing this! I might also give your idea a try as well and I'll report back. My solution is a bit like trying to defuse a bomb with a sledgehammer so I'll try and find a more elegant solution that limits the increase in code size. If anything I'll have learned a thing or two about OCamls codegen.

Use GOT-indirect (lgrl @got + basr) only for runtime glue functions
(caml_call_gc, caml_alloc*, caml_c_call, caml_raise_exn, etc.) where
r15 may point at the OCaml fiber stack rather than a C stack frame.

Regular OCaml-to-OCaml calls and tailcalls use brasl @plt as before,
since these resolve eagerly and never hit the PLT lazy binding
trampoline that assumes a C-ABI stack layout.

This narrows the original fix (which made all calls GOT-indirect)
per review feedback, reducing code size and preserving direct branch
prediction for inter-module OCaml calls.
@Zaneham Zaneham force-pushed the fix-s390x-plt-heap-corruption branch from 2ef6880 to 85fcbcb Compare February 23, 2026 19:27
@Zaneham
Copy link
Copy Markdown
Contributor Author

Zaneham commented Feb 23, 2026

Okay so I tested my fix versus the narrower (and elegant) solution. emit_call now uses brasl @PLT for for regular OCaml-to-OCaml calls and tailcalls. I tested on s390x and theres still zero segfaults. I've run a small fib 30 program and the new solution saved 728 bytes (1,048,522 from my previous solution, 1,047,794 for the new solution).

I did consider @dra27's startup object approach too. It's arguably cleaner but would need several build system changes across multiple files and I'm not quite there yet in comfort doing that I'm still learning the codebase and all of its quirks. Thanks again for your time.

Copy link
Copy Markdown
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good to me. Thanks !

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Mar 1, 2026

I took the liberty of pushing two tiny tweaks. Good to go once CI passes.

@nojb nojb added the merge-me label Mar 1, 2026
@nojb nojb merged commit fe32118 into ocaml:trunk Mar 1, 2026
35 checks passed
@nojb
Copy link
Copy Markdown
Contributor

nojb commented Mar 2, 2026

@Octachron: OK to cherry-pick to 5.5 (together with #14609)?

gasche added a commit that referenced this pull request Mar 2, 2026
@Octachron
Copy link
Copy Markdown
Member

Yes, as this is a bug fix and we are still in the alpha/beta release phase.

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Mar 2, 2026

Cherry-pick to 5.5: 392c4ec

nojb added a commit that referenced this pull request Mar 2, 2026
@nojb
Copy link
Copy Markdown
Contributor

nojb commented Mar 2, 2026

Cherry-pick to 5.5: 392c4ec

And moved the Changes entry in trunk to the 5.5 section: d865bf7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

libasmrun_shared.so segfaulting on s390x Fedora/RHEL

5 participants