Take PLT-clobbered registers into account at Ialloc#1304
Take PLT-clobbered registers into account at Ialloc#1304xavierleroy merged 3 commits intoocaml:trunkfrom
Conversation
|
Could this issue also affect other platforms ? |
|
Yeah, it might do. I am trying to work that out at the moment. I don't think it affects i386 though, but will check again. |
|
Right, I think this fixes AArch64 as well now. I also made some improvements to the fix for x86-64. Still looking at other platforms. |
|
Scratch that: in fact I don't think anything needs changing for AArch64. The code is kind of intricate. It looks like the two problematic registers (x16 and x17) won't ever be used by the register allocator, so we should be ok as-is. |
|
I looked through the other backends and I don't think any changes are needed. @xavierleroy I would appreciate it if you could check too. For s390x, the trampolines file here has a useful comment: |
xavierleroy
left a comment
There was a problem hiding this comment.
Looks good to me. Approved.
| @@ -309,7 +334,10 @@ let max_register_pressure = function | |||
| if fp then [| 3; 0 |] else [| 4; 0 |] | |||
| | Iintop(Idiv | Imod) | Iintop_imm((Idiv | Imod), _) -> | |||
| if fp then [| 10; 16 |] else [| 11; 16 |] | |||
There was a problem hiding this comment.
This is completely orthogonal to the present pull request, but just for the record: these if fp are rather ugly and should eventually be handled along the lines of num_destroyed_by_plt_stub, i.e. with a num_allocatable_int_regs constant that is 13 or 12 depending on fp.
There was a problem hiding this comment.
Also, I noticed that some of these are inaccurate as regards Spacetime, which sometimes occupies another hard register. I can try to fix these all at once.
There was a problem hiding this comment.
In the end these "max pressure" numbers are heuristics for preventive spilling, so no wrong code will be generated if the numbers are off. But, yes, there is room for improvement here, both in precision of the numbers and in elegance of the code!
|
I took the liberty of rephrasing the title of this PR. The original "Save PLT-clobbered registers" suggested that those registers would be saved and restored across calls to Ialloc, which would not be good. |
|
For ARM64 I confirm that registers |
|
I think this should be cherry-picked to 4.05 as well. Any other opinions? |
|
I'll squash-and-merge in trunk first, then look into 4.05. |
Mark PLT-clobbered registers as destroyed across the Ialloc instruction. Currently only x86-64 is affected, in PIC mode only, and only with the glibc dynamic loader. Cherry-pick of [trunk 70c585d]
|
Cherry-picked to 4.05 branch, after the usual fight with the Changes file. |
Mark PLT-clobbered registers as destroyed across the Ialloc instruction. Currently only x86-64 is affected, in PIC mode only, and only with the glibc dynamic loader. Cherry-pick of [trunk 70c585d]
Mark PLT-clobbered registers as destroyed across the Ialloc instruction. Currently only x86-64 is affected, in PIC mode only, and only with the glibc dynamic loader. Cherry-pick of [trunk 70c585d]
Calls in dynamically-linked libraries on Unix-like platforms are indirect jumps through the PLT (procedure linkage table). In the usual case of lazy symbol binding the PLT entries initially point at stubs which invoke the dynamic linker to resolve the relevant symbol. The PLT entry is overwritten with the jump target once it has been resolved. (This latter step happens at startup if non-lazy binding is in effect.)
I believe the PLT stubs may clobber those registers which are none of: (a) C argument registers; (b) C return value registers; (c) C callee-save registers. On x86-64 this set comprises
r10andr11. The backend has correctly avoided these for the passing of arguments for some time. However these registers must also be avoided between OCaml code and the assembly code incaml_call_gcwhich saves registers into thecaml_gc_regsblock, sincecaml_call_gcitself may be called via the PLT. Otherwise what happens is that any live values inr10and/orr11might be clobbered by the PLT stub and then saved into thecaml_gc_regsblock.This bug was responsible for a number of obscure segfaults which @sliquister did a fine job of investigating. Such failures are obscure: the first allocation in a given dynamic library to trigger a GC needs to have something live across it in
r10and/orr11.