Skip to content

Take PLT-clobbered registers into account at Ialloc#1304

Merged
xavierleroy merged 3 commits intoocaml:trunkfrom
mshinwell:dynlink_clobbering
Aug 28, 2017
Merged

Take PLT-clobbered registers into account at Ialloc#1304
xavierleroy merged 3 commits intoocaml:trunkfrom
mshinwell:dynlink_clobbering

Conversation

@mshinwell
Copy link
Copy Markdown
Contributor

Calls in dynamically-linked libraries on Unix-like platforms are indirect jumps through the PLT (procedure linkage table). In the usual case of lazy symbol binding the PLT entries initially point at stubs which invoke the dynamic linker to resolve the relevant symbol. The PLT entry is overwritten with the jump target once it has been resolved. (This latter step happens at startup if non-lazy binding is in effect.)

I believe the PLT stubs may clobber those registers which are none of: (a) C argument registers; (b) C return value registers; (c) C callee-save registers. On x86-64 this set comprises r10 and r11. The backend has correctly avoided these for the passing of arguments for some time. However these registers must also be avoided between OCaml code and the assembly code in caml_call_gc which saves registers into the caml_gc_regs block, since caml_call_gc itself may be called via the PLT. Otherwise what happens is that any live values in r10 and/or r11 might be clobbered by the PLT stub and then saved into the caml_gc_regs block.

This bug was responsible for a number of obscure segfaults which @sliquister did a fine job of investigating. Such failures are obscure: the first allocation in a given dynamic library to trigger a GC needs to have something live across it in r10 and/or r11.

@mshinwell mshinwell added the bug label Aug 28, 2017
@mshinwell mshinwell requested a review from xavierleroy August 28, 2017 08:55
@nojb
Copy link
Copy Markdown
Contributor

nojb commented Aug 28, 2017

Could this issue also affect other platforms ?

@mshinwell
Copy link
Copy Markdown
Contributor Author

Yeah, it might do. I am trying to work that out at the moment. I don't think it affects i386 though, but will check again.

@mshinwell
Copy link
Copy Markdown
Contributor Author

Right, I think this fixes AArch64 as well now. I also made some improvements to the fix for x86-64. Still looking at other platforms.

@mshinwell
Copy link
Copy Markdown
Contributor Author

Scratch that: in fact I don't think anything needs changing for AArch64. The code is kind of intricate. It looks like the two problematic registers (x16 and x17) won't ever be used by the register allocator, so we should be ok as-is.

@mshinwell
Copy link
Copy Markdown
Contributor Author

I looked through the other backends and I don't think any changes are needed. @xavierleroy I would appreciate it if you could check too.

For s390x, the trampolines file here has a useful comment:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/s390/s390-64/dl-trampoline.h;h=e1f95e2ecd56162d53d74776d6e1e9b1915458f6;hb=refs/heads/master

Copy link
Copy Markdown
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Approved.

@@ -309,7 +334,10 @@ let max_register_pressure = function
if fp then [| 3; 0 |] else [| 4; 0 |]
| Iintop(Idiv | Imod) | Iintop_imm((Idiv | Imod), _) ->
if fp then [| 10; 16 |] else [| 11; 16 |]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is completely orthogonal to the present pull request, but just for the record: these if fp are rather ugly and should eventually be handled along the lines of num_destroyed_by_plt_stub, i.e. with a num_allocatable_int_regs constant that is 13 or 12 depending on fp.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I noticed that some of these are inaccurate as regards Spacetime, which sometimes occupies another hard register. I can try to fix these all at once.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end these "max pressure" numbers are heuristics for preventive spilling, so no wrong code will be generated if the numbers are off. But, yes, there is room for improvement here, both in precision of the numbers and in elegance of the code!

@xavierleroy xavierleroy changed the title Save PLT-clobbered registers at Ialloc Take PLT-clobbered registers into account at Ialloc Aug 28, 2017
@xavierleroy
Copy link
Copy Markdown
Contributor

I took the liberty of rephrasing the title of this PR. The original "Save PLT-clobbered registers" suggested that those registers would be saved and restored across calls to Ialloc, which would not be good.

@xavierleroy
Copy link
Copy Markdown
Contributor

For ARM64 I confirm that registers x16 and x17 are not allocatable and used only as temporaries and never across a function call.

@mshinwell
Copy link
Copy Markdown
Contributor Author

I think this should be cherry-picked to 4.05 as well. Any other opinions?

@xavierleroy
Copy link
Copy Markdown
Contributor

I'll squash-and-merge in trunk first, then look into 4.05.

@xavierleroy xavierleroy merged commit 70c585d into ocaml:trunk Aug 28, 2017
xavierleroy pushed a commit that referenced this pull request Aug 28, 2017
Mark PLT-clobbered registers as destroyed across the Ialloc instruction.

Currently only x86-64 is affected, in PIC mode only, and only with the glibc dynamic loader.

Cherry-pick of [trunk 70c585d]
@xavierleroy
Copy link
Copy Markdown
Contributor

Cherry-picked to 4.05 branch, after the usual fight with the Changes file.

xclerc pushed a commit to xclerc/ocaml that referenced this pull request Nov 9, 2017
Mark PLT-clobbered registers as destroyed across the Ialloc instruction.

Currently only x86-64 is affected, in PIC mode only, and only with the glibc dynamic loader.

Cherry-pick of [trunk 70c585d]
EduardoRFS pushed a commit to EduardoRFS/ocaml that referenced this pull request Aug 8, 2020
Mark PLT-clobbered registers as destroyed across the Ialloc instruction.

Currently only x86-64 is affected, in PIC mode only, and only with the glibc dynamic loader.

Cherry-pick of [trunk 70c585d]
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants