s390x: use GOT-indirect calls to fix heap corruption with shared runt… by Zaneham · Pull Request #14547 · ocaml/ocaml

Zaneham · 2026-02-12T11:42:50Z

This was a fun one. Classic storage overlay on s390x, on z/OS you'd get a nice S0C4 abend with a dump to work from,
but on Linux all you get is signal 11 and a long night with GDB.

The PLT lazy binding trampoline (_dl_runtime_resolve_vx) saves float registers relative to r15, assuming it's a proper
C stack frame. OCaml's r15 points to its own fiber stack, which lives on the heap. So, when runtime functions like
caml_alloc1 get called through the PLT with libasmrun_shared.so, the trampoline happily saves +infinity and epsilon
right over a skiplist cell. caml_skiplist_insert then tries to follow a forward pointer that's actually IEEE 754
infinity, and down we go.

Linking with -Wl,-z,now also fixes this by forcing eager symbol resolution at load time, but that just sidesteps the
problem. The codegen is still emitting calls that violate the s390x ABI whenever r15 is on the OCaml stack, and
anything else that assumes a valid C frame at r15 would hit the same class of bug.

The underlying ABI violation exists on all distros. It's confirmed on RHEL 8.9 and Fedora 41 but not observed on
Ubuntu 24.04, maybe due to differences in glibc's PLT resolver behaviour or heap layout. If anyone knows more, I would like your input. The bug is latent regardless.

Fix is straightforward: swap @plt calls for GOT-indirect calls (lgrl/basr instead of brasl). The GOT entries resolve at load time, so the trampoline never fires. This is the same pattern already used for data symbols in
emit_load_symbol_addr.

Tested on s390x RHEL 8.9:

Before: segfault (exit 139)
After: Hello, world of s390x!
LD_BIND_NOW=1 independently confirms the trampoline as the culprit

Fixes #13693
Related: #14514 - my previous PR attempt which was a "flop"

xavierleroy · 2026-02-13T15:34:00Z

Nice detective work, thank you!

The codegen is still emitting calls that violate the s390x ABI whenever r15 is on the OCaml stack

That's correct, but it's intentional: the standard ABI is too inefficient for a functional language. (We're not wasting 160 bytes for every stack frame.)

, and anything else that assumes a valid C frame at r15 would hit the same class of bug.

We're careful to switch to the C stack before calling C functions (see the Iextcall instruction and the glue functions in runtime/s390x.S). But we can't do that for the dynamic resolver on @PLT functions...

Fix is straightforward: swap @plt calls for GOT-indirect calls (lgrl/basr instead of brasl). The GOT entries resolve at load time, so the trampoline never fires.

That's a good fix. The only downside is an increase in code size, but it's not too bad: +3% for the OCaml standard library. I didn't try to measure a possible increase in running time.

A fix that would avoid increasing code size would be to force "bind now" resolution, as per -Wl,-z now or LD_BIND_NOW=1, but I don't know how to do that from within the ocamlopt-generated files (as opposed to asking the end user to link specially).

Yet another option would be to maintain a "red zone" of 160 free bytes at the bottom of the OCaml stacks. There would be a single 160-byte area, not one per stack frame. That would give the dynamic resolver the space it expects. However, unlike the C stack, the OCaml stacks don't grow automatically, so it's still dangerous to run code produced by a C compiler (incl. the dynamic resolver) on the OCaml stacks.

So, I'm OK with merging this PR, but I'll wait a bit for others to comment.

dra27 · 2026-02-20T10:14:06Z

Thanks for working on this! I've pushed a commit updating the test-in-prefix tests - segfaulting is no longer "permitted" on s390x (although the workaround code is still needed for riscv, which has a similar problem).

The failure can be seen on Jenkins for any of the recent trunk runs by searching the logs for nat_obj_shared where <exit 139> can then be seen on all the runs. With this PR in precheck#1137, there is no longer <exit 139>

dra27 · 2026-02-20T16:50:33Z

Would another option here be to effectively move all the OCaml-to-C parts of s390.S to the startup object of any program i.e. leave caml_start_program and the associated caml_callback_* functions where they are, but put the allocation, GC, fiber and tsan functions into a separate file which gets emitted with all the generated functions? They would then always be local to any program, and libasmrun_shared.so only exposes functions which strictly follow the ABI.

xavierleroy · 2026-02-22T08:54:53Z

I thought about treating calls to runtime glue functions (the ones defined in s390x.S) specially, but then I realized that ocamlopt also generates @PLT calls for calls to OCaml functions. (At least in PIC mode, which is the default.) So, I thought these other @PLT calls could also create problems and should also be avoided.

Thinking some more about it, I believe @PLT references for calls to OCaml functions are always resolved eagerly, either statically if they occur in the main program or at dlopen time if they occur in a shared OCaml library.

So, the only @PLT references that are resolved on first call are (1) to runtime glue functions, and (2) to "noalloc" C functions. (2) is safe because the calls are taken on the C stack. (1) can be fixed as proposed in this PR, with a load address followed by an indirect call.

Emitting load address + indirect call for all OCaml function calls doesn't hurt code size much, from my measurements, but I don't know if there's an impact on execution time. Naively, all these indirect calls can be predicted perfectly, but it could use a lot of branch prediction table entries.

Zaneham · 2026-02-23T11:39:57Z

Hello @xavierleroy, apologies on the late reply I ended up being busy with a few things. Thanks for the information on the intentional ABI deviation. I'll actually go and see what narrowing it to just runtime glue calls will do and measure the branch prediction and performance. I'll report back when I have a spot of time.

@dra27 Thanks for testing this! I might also give your idea a try as well and I'll report back. My solution is a bit like trying to defuse a bomb with a sledgehammer so I'll try and find a more elegant solution that limits the increase in code size. If anything I'll have learned a thing or two about OCamls codegen.

@got

Use GOT-indirect (lgrl @got + basr) only for runtime glue functions (caml_call_gc, caml_alloc*, caml_c_call, caml_raise_exn, etc.) where r15 may point at the OCaml fiber stack rather than a C stack frame. Regular OCaml-to-OCaml calls and tailcalls use brasl @plt as before, since these resolve eagerly and never hit the PLT lazy binding trampoline that assumes a C-ABI stack layout. This narrows the original fix (which made all calls GOT-indirect) per review feedback, reducing code size and preserving direct branch prediction for inter-module OCaml calls.

Zaneham · 2026-02-23T19:36:27Z

Okay so I tested my fix versus the narrower (and elegant) solution. emit_call now uses brasl @PLT for for regular OCaml-to-OCaml calls and tailcalls. I tested on s390x and theres still zero segfaults. I've run a small fib 30 program and the new solution saved 728 bytes (1,048,522 from my previous solution, 1,047,794 for the new solution).

I did consider @dra27's startup object approach too. It's arguably cleaner but would need several build system changes across multiple files and I'm not quite there yet in comfort doing that I'm still learning the codebase and all of its quirks. Thanks again for your time.

xavierleroy

Looks very good to me. Thanks !

asmcomp/s390x/emit.mlp

Changes

nojb · 2026-03-01T08:11:11Z

I took the liberty of pushing two tiny tweaks. Good to go once CI passes.

nojb · 2026-03-02T04:27:25Z

@Octachron: OK to cherry-pick to 5.5 (together with #14609)?

Re-add missing commit from #14547

Octachron · 2026-03-02T09:16:22Z

Yes, as this is a bug fix and we are still in the alpha/beta release phase.

nojb · 2026-03-02T10:03:28Z

Cherry-pick to 5.5: 392c4ec

nojb · 2026-03-02T10:07:06Z

Cherry-pick to 5.5: 392c4ec

And moved the Changes entry in trunk to the 5.5 section: d865bf7.

nojb assigned xavierleroy Feb 18, 2026

Zaneham force-pushed the fix-s390x-plt-heap-corruption branch from 2ef6880 to 85fcbcb Compare February 23, 2026 19:27

Merge branch 'trunk' into fix-s390x-plt-heap-corruption

9977922

xavierleroy approved these changes Feb 28, 2026

View reviewed changes

nojb reviewed Mar 1, 2026

View reviewed changes

asmcomp/s390x/emit.mlp Outdated Show resolved Hide resolved

nojb reviewed Mar 1, 2026

View reviewed changes

Changes Outdated Show resolved Hide resolved

nojb added 2 commits March 1, 2026 09:10

Apply suggestion from @nojb

7af90a8

Apply suggestion from @nojb

8262d94

nojb added the merge-me label Mar 1, 2026

nojb merged commit fe32118 into ocaml:trunk Mar 1, 2026
35 checks passed

dra27 mentioned this pull request Mar 1, 2026

Re-add missing commit from #14547 #14609

Merged

gasche added a commit that referenced this pull request Mar 2, 2026

Merge pull request #14609 from dra27/s390x-shared-testing

5924b83

Re-add missing commit from #14547

nojb mentioned this pull request Mar 2, 2026

Fix shared library crash on s390x (RHEL/Fedora) #14514

Closed

nojb added a commit that referenced this pull request Mar 2, 2026

Move Changes entry of #14547 to 5.5 section

d865bf7

Conversation

Zaneham commented Feb 12, 2026

Uh oh!

xavierleroy commented Feb 13, 2026

Uh oh!

dra27 commented Feb 20, 2026

Uh oh!

dra27 commented Feb 20, 2026

Uh oh!

xavierleroy commented Feb 22, 2026

Uh oh!

Zaneham commented Feb 23, 2026

Uh oh!

Zaneham commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xavierleroy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nojb commented Mar 1, 2026

Uh oh!

Uh oh!

nojb commented Mar 2, 2026

Uh oh!

Octachron commented Mar 2, 2026

Uh oh!

nojb commented Mar 2, 2026

Uh oh!

nojb commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Zaneham commented Feb 23, 2026 •

edited

Loading