Missing ABI features required for struct-args and i128 on s390x

In implementing support for struct-args and i128 to enable `rustc_codegen_cranelift` on s390x, I ran into two features that are not currently available with the common `abi_impl` implementation.

---

**Struct arguments using an explicit buffer address**

Many commonly used ABIs have special provisions for (large) struct arguments that cannot be passed in registers.  However, there are two typical flavors of these, and `abi_impl` only supports one of them.

Both these flavors have in common that the actual argument is copied by the caller into a buffer on the stack; the callee will then read the argument from that buffer, and is even free to modify it.  However, the two flavors differ in how the address of that buffer is _communicated_ from the caller to the callee:

1. Implicit buffer address.   With this flavor, that ABI implicitly defines the location of the buffer, usually by making the buffer part of the same area used for stack slots for overflow scalar arguments.  Both caller and callee can compute the location of the buffer at compile time simply based on the function call signature; the address is not materialized as part of the run-time call sequence.
2. Explicit buffer address.  Here, the ABI does not define the location of the buffer itself, but leaves the compiler free to place it anywhere.  The actual location of the buffer is passed at run-time from caller to callee, typically in a register (or overflow stack slot) as if we had a pointer at this place in the function call signature.

There are pros and cons to either approach, leading to different choices of method across platform ABIs.  In particular, `x86_64` nearly always uses the implicit buffer address method, with the only exception of structures whose size is not a compile-time constant.  Many other platforms, including `s390x`, use the the explicit buffer address method always, which allows the opportunity to the compiler to avoid actually copying the structure, e.g. if the original copy can be shown to be unused after the call so it wouldn't matter if the callee clobbers its contents.

However, the current cranelift `abi_impl` _only_ supports the implicit buffer address method.   This should mostly be OK for `x64_64` - I'm not sure if variable-sized structures passed by value even exist with the rustc frontend.   However, even for `aarch64` I believe the current implementation is already incorrect - as far as I can see, GCC and LLVM use the explicit buffer address method on this architecture.    This is definitely also the case on `s390x`.

This should be straightforward to fix, and I already have an implementation.   The main change is to the `ABIArg` data type, which currently holds *either* a vector of `ABISlot`s (registers or overflow stack slots) *or* a `StructArg` identifying a buffer location on the stack.  I've changed this to optionally provide *both* these items - where the semantics of providing both means that the `ABISlot` identifies the register (or overflow stack slot) needed to hold the address of the buffer at run time.

---

**Type legalization in the back-end**

If a function signature contains an argument of a type that cannot be directly passed in registers or overflow stack slots, but needs to use the (explicit or implicit) struct-arg buffer method,  the function signature needs to be legalized: this consists of replacing the argument of struct type by a **pointer** to that struct type, and adding code around function call sites as well as prologue and epilogue to convert between the two (taking the address of the struct, or dereferencing the pointer, as appropriate).

Now, usually this legalization is already done by the language front-end (whether clang or rustc), and by the time the (LLVM or cranelift) IR gets to the back end, we already have a pointer type.  And in fact the current `abi_impl` assumes that exactly this is the case.

But this is unfortunately not universally true.  There are a few cases whether the front end does not perform any legalization, but rather relies on the (currently LLVM) back end for that.  This is the case in particular for the `i128` data type on `s390x`.   This remains as a scalar integer type in function call signatures, but is actually _implemented_ using a hidden buffer like the struct-arg case.   With LLVM, this legalization is performed in the back-end - and to implement `i128` in cranelift, we probably need to do the same here.

Interestingly enough, the `AbiParam` data type in cranelift contains a `legalized_to_pointer` flag which would appear to address exactly this situation.  However, as far as I can tell this flag is currently never set or used in all of cranelift.   It is not clear to me how this was originally intended to be used, or where exactly this legalization would have been done.

I have implemented the functionality for now using yet another flavor of the `ABIArg` data type, but that seems a bit of a hack.  One problem is that when implementing the pointer dereferencing in the prologue, the `ABIArg` slots only carry type information for the pointer (usually `I64`), and possibly the size of the buffer, but not the *type* of the buffer, so it is a bit unclear which instruction to use to implement the dereferencing.

Another problem is that if the buffer address is passed explicitly, but in a stack slot, we require a temporary register to load that address in order to dereference it (since the `into_regs` will hold the final `i128` value, which lives in the vector register set on `s390x`, it cannot be used as address).   It's not quite clear how to allocate a temp at this stage in the compilation process.  (I'm using the "stack limit register" right now, which appears to work, but I'm not sure this is guaranteed to be OK on all platforms.)

Finally, this legalization needs to be done not just for function arguments, but also *return values*.   Here, `abi_impl` already has code to allocate a buffer in the caller and pass its address to the callee so it can place the return value there.  This is exactly what we need on `s390x` for a `i128` return value - except for one implementation detail: cranelift passes that implicit pointer as *last* argument (because this is expected by wasmtime when used for the multiple return value extension), while the platform ABI requires the implicit pointer as *first* argument.   To work around this, I'm now passing the return pointer first if the `enable_llvm_abi_extensions` flag is true, and last otherwise.  This seems to work with both wasmtime and rustc_codegen_cranelift, but also feels like a hack.

---

@cfallin these are the problems I mentioned in our last meeting.  I'd be happy to implement a solution, but I'd appreciate some guidance / discussion on what the correct approach should be.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing ABI features required for struct-args and i128 on s390x #4565

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Missing ABI features required for struct-args and i128 on s390x #4565

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions