Skip to content

[s390x, abi_impl] Add i128 support#4598

Merged
cfallin merged 1 commit intobytecodealliance:mainfrom
uweigand:s390x-i128
Aug 4, 2022
Merged

[s390x, abi_impl] Add i128 support#4598
cfallin merged 1 commit intobytecodealliance:mainfrom
uweigand:s390x-i128

Conversation

@uweigand
Copy link
Copy Markdown
Member

@uweigand uweigand commented Aug 3, 2022

This adds full i128 support to the s390x target, including new filetests
and enabling the existing i128 runtest on s390x.

The ABI requires that i128 is passed and returned via implicit pointer,
but the front end still generates direct i128 types in call. This means
we have to implement ABI support to implicitly convert i128 types to
pointers when passing arguments.

To do so, we add a new variant ABIArg::ImplicitArg. This acts like
StructArg, except that the value type is the actual target type,
not a pointer type. The required conversions have to be inserted
in the prologue and at function call sites.

Note that when dereferencing the implicit pointer in the prologue,
we may require a temp register: the pointer may be passed on the
stack so it needs to be loaded first, but the value register may
be in the wrong class for pointer values. In this case, we use
the "stack limit" register, which should be available at this
point in the prologue.

For return values, we use a mechanism similar to the one used for
supporting multiple return values in the Wasmtime ABI. The only
difference is that the hidden pointer to the return buffer must
be the first, not last, argument in this case.

(FYI @cfallin - This implements the second half of issue #4565.)

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. labels Aug 3, 2022
Copy link
Copy Markdown
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a bunch for this large amount of work!

I have a few nits below asking for a few comments, and possibly renaming one thing, but. otherwise this looks fine. I didn't look at all of the new lowered sequences in detail but the runtests are the best arbiter of correctness there so I will lean on the fact that tests pass!

// If CC != 0, we'd done, so jump over the next instruction.
let opcode = 0xa74; // BCR
put(sink, &enc_ri_c(opcode, 7, 4 + 6));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment somewhere in here that we write tmp only after the last reads of rn and rm, as required by the semantics of ordinary (early) uses and ordinary (late) defs? And perhaps a note that if this changes, tmp has to become an early def. (I mostly want to future-proof against a subtle mistake later.)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

/// Implicit argument. Similar to a StructArg, except that we have the
/// target type, not a pointer type, at the CLIF-level. This argument is
/// still being passed via reference implicitly.
ImplicitArg {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this an ImplicitPtrArg or similar? Just ImplicitArg seems slightly ambiguous to me; like the arg itself is implicitly generated at callsites or something like that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

&ABIArgSlot::Reg { reg, .. } => Reg::from(reg),
&ABIArgSlot::Stack { offset, ty, .. } => {
// In this case we need a temp register to hold the address.
// Use the stack limit register as temp.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an alternative I suppose we could provide an "allocate temp vreg" callback to gen_copy_arg_to_regs; this codegen happens before regalloc so it is still an option to create vregs here. Perhaps a comment here to record the altnernate possibility? (We may in the future want to remove the need for more dedicated non-allocatable registers, at which point it may become more relevant.)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, that's better. However, I wasn't able to get a callback past the borrow checker - maybe I'm just not experienced enough with Rust, but it does look a real problem:

                for insn in self.vcode.abi().gen_copy_arg_to_regs(i, regs).into_iter() {

This already holds a borrow on self during the gen_copy_arg_to_regs call. Passing a closure would require another (mutable) borrow on self, which isn't allowed.

Instead, I noticed that there is already a mechanism to pass a temp reg to ABICallee via the temp_needed/init mechanism, so I simply extended that to support multiple temps.

@@ -0,0 +1,58 @@
test run
target aarch64
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a note here that s390x is not yet supported because (reason)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

This adds full i128 support to the s390x target, including new filetests
and enabling the existing i128 runtest on s390x.

The ABI requires that i128 is passed and returned via implicit pointer,
but the front end still generates direct i128 types in call.  This means
we have to implement ABI support to implicitly convert i128 types to
pointers when passing arguments.

To do so, we add a new variant ABIArg::ImplicitArg.  This acts like
StructArg, except that the value type is the actual target type,
not a pointer type.  The required conversions have to be inserted
in the prologue and at function call sites.

Note that when dereferencing the implicit pointer in the prologue,
we may require a temp register: the pointer may be passed on the
stack so it needs to be loaded first, but the value register may
be in the wrong class for pointer values.  In this case, we use
the "stack limit" register, which should be available at this
point in the prologue.

For return values, we use a mechanism similar to the one used for
supporting multiple return values in the Wasmtime ABI.  The only
difference is that the hidden pointer to the return buffer must
be the *first*, not last, argument in this case.

(This implements the second half of issue bytecodealliance#4565.)
Copy link
Copy Markdown
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

@cfallin cfallin enabled auto-merge (squash) August 4, 2022 20:16
@cfallin cfallin merged commit b17b1eb into bytecodealliance:main Aug 4, 2022
@uweigand uweigand deleted the s390x-i128 branch August 4, 2022 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants