In implementing support for struct-args and i128 to enable rustc_codegen_cranelift on s390x, I ran into two features that are not currently available with the common abi_impl implementation.
Struct arguments using an explicit buffer address
Many commonly used ABIs have special provisions for (large) struct arguments that cannot be passed in registers. However, there are two typical flavors of these, and abi_impl only supports one of them.
Both these flavors have in common that the actual argument is copied by the caller into a buffer on the stack; the callee will then read the argument from that buffer, and is even free to modify it. However, the two flavors differ in how the address of that buffer is communicated from the caller to the callee:
- Implicit buffer address. With this flavor, that ABI implicitly defines the location of the buffer, usually by making the buffer part of the same area used for stack slots for overflow scalar arguments. Both caller and callee can compute the location of the buffer at compile time simply based on the function call signature; the address is not materialized as part of the run-time call sequence.
- Explicit buffer address. Here, the ABI does not define the location of the buffer itself, but leaves the compiler free to place it anywhere. The actual location of the buffer is passed at run-time from caller to callee, typically in a register (or overflow stack slot) as if we had a pointer at this place in the function call signature.
There are pros and cons to either approach, leading to different choices of method across platform ABIs. In particular, x86_64 nearly always uses the implicit buffer address method, with the only exception of structures whose size is not a compile-time constant. Many other platforms, including s390x, use the the explicit buffer address method always, which allows the opportunity to the compiler to avoid actually copying the structure, e.g. if the original copy can be shown to be unused after the call so it wouldn't matter if the callee clobbers its contents.
However, the current cranelift abi_impl only supports the implicit buffer address method. This should mostly be OK for x64_64 - I'm not sure if variable-sized structures passed by value even exist with the rustc frontend. However, even for aarch64 I believe the current implementation is already incorrect - as far as I can see, GCC and LLVM use the explicit buffer address method on this architecture. This is definitely also the case on s390x.
This should be straightforward to fix, and I already have an implementation. The main change is to the ABIArg data type, which currently holds either a vector of ABISlots (registers or overflow stack slots) or a StructArg identifying a buffer location on the stack. I've changed this to optionally provide both these items - where the semantics of providing both means that the ABISlot identifies the register (or overflow stack slot) needed to hold the address of the buffer at run time.
Type legalization in the back-end
If a function signature contains an argument of a type that cannot be directly passed in registers or overflow stack slots, but needs to use the (explicit or implicit) struct-arg buffer method, the function signature needs to be legalized: this consists of replacing the argument of struct type by a pointer to that struct type, and adding code around function call sites as well as prologue and epilogue to convert between the two (taking the address of the struct, or dereferencing the pointer, as appropriate).
Now, usually this legalization is already done by the language front-end (whether clang or rustc), and by the time the (LLVM or cranelift) IR gets to the back end, we already have a pointer type. And in fact the current abi_impl assumes that exactly this is the case.
But this is unfortunately not universally true. There are a few cases whether the front end does not perform any legalization, but rather relies on the (currently LLVM) back end for that. This is the case in particular for the i128 data type on s390x. This remains as a scalar integer type in function call signatures, but is actually implemented using a hidden buffer like the struct-arg case. With LLVM, this legalization is performed in the back-end - and to implement i128 in cranelift, we probably need to do the same here.
Interestingly enough, the AbiParam data type in cranelift contains a legalized_to_pointer flag which would appear to address exactly this situation. However, as far as I can tell this flag is currently never set or used in all of cranelift. It is not clear to me how this was originally intended to be used, or where exactly this legalization would have been done.
I have implemented the functionality for now using yet another flavor of the ABIArg data type, but that seems a bit of a hack. One problem is that when implementing the pointer dereferencing in the prologue, the ABIArg slots only carry type information for the pointer (usually I64), and possibly the size of the buffer, but not the type of the buffer, so it is a bit unclear which instruction to use to implement the dereferencing.
Another problem is that if the buffer address is passed explicitly, but in a stack slot, we require a temporary register to load that address in order to dereference it (since the into_regs will hold the final i128 value, which lives in the vector register set on s390x, it cannot be used as address). It's not quite clear how to allocate a temp at this stage in the compilation process. (I'm using the "stack limit register" right now, which appears to work, but I'm not sure this is guaranteed to be OK on all platforms.)
Finally, this legalization needs to be done not just for function arguments, but also return values. Here, abi_impl already has code to allocate a buffer in the caller and pass its address to the callee so it can place the return value there. This is exactly what we need on s390x for a i128 return value - except for one implementation detail: cranelift passes that implicit pointer as last argument (because this is expected by wasmtime when used for the multiple return value extension), while the platform ABI requires the implicit pointer as first argument. To work around this, I'm now passing the return pointer first if the enable_llvm_abi_extensions flag is true, and last otherwise. This seems to work with both wasmtime and rustc_codegen_cranelift, but also feels like a hack.
@cfallin these are the problems I mentioned in our last meeting. I'd be happy to implement a solution, but I'd appreciate some guidance / discussion on what the correct approach should be.
In implementing support for struct-args and i128 to enable
rustc_codegen_cranelifton s390x, I ran into two features that are not currently available with the commonabi_implimplementation.Struct arguments using an explicit buffer address
Many commonly used ABIs have special provisions for (large) struct arguments that cannot be passed in registers. However, there are two typical flavors of these, and
abi_implonly supports one of them.Both these flavors have in common that the actual argument is copied by the caller into a buffer on the stack; the callee will then read the argument from that buffer, and is even free to modify it. However, the two flavors differ in how the address of that buffer is communicated from the caller to the callee:
There are pros and cons to either approach, leading to different choices of method across platform ABIs. In particular,
x86_64nearly always uses the implicit buffer address method, with the only exception of structures whose size is not a compile-time constant. Many other platforms, includings390x, use the the explicit buffer address method always, which allows the opportunity to the compiler to avoid actually copying the structure, e.g. if the original copy can be shown to be unused after the call so it wouldn't matter if the callee clobbers its contents.However, the current cranelift
abi_implonly supports the implicit buffer address method. This should mostly be OK forx64_64- I'm not sure if variable-sized structures passed by value even exist with the rustc frontend. However, even foraarch64I believe the current implementation is already incorrect - as far as I can see, GCC and LLVM use the explicit buffer address method on this architecture. This is definitely also the case ons390x.This should be straightforward to fix, and I already have an implementation. The main change is to the
ABIArgdata type, which currently holds either a vector ofABISlots (registers or overflow stack slots) or aStructArgidentifying a buffer location on the stack. I've changed this to optionally provide both these items - where the semantics of providing both means that theABISlotidentifies the register (or overflow stack slot) needed to hold the address of the buffer at run time.Type legalization in the back-end
If a function signature contains an argument of a type that cannot be directly passed in registers or overflow stack slots, but needs to use the (explicit or implicit) struct-arg buffer method, the function signature needs to be legalized: this consists of replacing the argument of struct type by a pointer to that struct type, and adding code around function call sites as well as prologue and epilogue to convert between the two (taking the address of the struct, or dereferencing the pointer, as appropriate).
Now, usually this legalization is already done by the language front-end (whether clang or rustc), and by the time the (LLVM or cranelift) IR gets to the back end, we already have a pointer type. And in fact the current
abi_implassumes that exactly this is the case.But this is unfortunately not universally true. There are a few cases whether the front end does not perform any legalization, but rather relies on the (currently LLVM) back end for that. This is the case in particular for the
i128data type ons390x. This remains as a scalar integer type in function call signatures, but is actually implemented using a hidden buffer like the struct-arg case. With LLVM, this legalization is performed in the back-end - and to implementi128in cranelift, we probably need to do the same here.Interestingly enough, the
AbiParamdata type in cranelift contains alegalized_to_pointerflag which would appear to address exactly this situation. However, as far as I can tell this flag is currently never set or used in all of cranelift. It is not clear to me how this was originally intended to be used, or where exactly this legalization would have been done.I have implemented the functionality for now using yet another flavor of the
ABIArgdata type, but that seems a bit of a hack. One problem is that when implementing the pointer dereferencing in the prologue, theABIArgslots only carry type information for the pointer (usuallyI64), and possibly the size of the buffer, but not the type of the buffer, so it is a bit unclear which instruction to use to implement the dereferencing.Another problem is that if the buffer address is passed explicitly, but in a stack slot, we require a temporary register to load that address in order to dereference it (since the
into_regswill hold the finali128value, which lives in the vector register set ons390x, it cannot be used as address). It's not quite clear how to allocate a temp at this stage in the compilation process. (I'm using the "stack limit register" right now, which appears to work, but I'm not sure this is guaranteed to be OK on all platforms.)Finally, this legalization needs to be done not just for function arguments, but also return values. Here,
abi_implalready has code to allocate a buffer in the caller and pass its address to the callee so it can place the return value there. This is exactly what we need ons390xfor ai128return value - except for one implementation detail: cranelift passes that implicit pointer as last argument (because this is expected by wasmtime when used for the multiple return value extension), while the platform ABI requires the implicit pointer as first argument. To work around this, I'm now passing the return pointer first if theenable_llvm_abi_extensionsflag is true, and last otherwise. This seems to work with both wasmtime and rustc_codegen_cranelift, but also feels like a hack.@cfallin these are the problems I mentioned in our last meeting. I'd be happy to implement a solution, but I'd appreciate some guidance / discussion on what the correct approach should be.