-
Notifications
You must be signed in to change notification settings - Fork 15.8k
[clang][WebAssembly] Fix Wasm Vararg pointer width #173580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-clang Author: None (daxpedda) ChangesVararg on Wasm64 was using 4 bytes for the pointer instead of 8 bytes. I checked the C ABI in the Wasm tool conventions just to make sure there isn't anything strange going on there, but there wasn't. Would still appreciate somebody double-checking this. Full diff: https://github.com/llvm/llvm-project/pull/173580.diff 1 Files Affected:
diff --git a/clang/lib/CodeGen/Targets/WebAssembly.cpp b/clang/lib/CodeGen/Targets/WebAssembly.cpp
index ebe996a4edd8d..ce97884fa57c4 100644
--- a/clang/lib/CodeGen/Targets/WebAssembly.cpp
+++ b/clang/lib/CodeGen/Targets/WebAssembly.cpp
@@ -160,10 +160,11 @@ RValue WebAssemblyABIInfo::EmitVAArg(CodeGenFunction &CGF, Address VAListAddr,
bool IsIndirect = isAggregateTypeForABI(Ty) &&
!isEmptyRecord(getContext(), Ty, true) &&
!isSingleElementStruct(Ty, getContext());
- return emitVoidPtrVAArg(CGF, VAListAddr, Ty, IsIndirect,
- getContext().getTypeInfoInChars(Ty),
- CharUnits::fromQuantity(4),
- /*AllowHigherAlign=*/true, Slot);
+ return emitVoidPtrVAArg(
+ CGF, VAListAddr, Ty, IsIndirect, getContext().getTypeInfoInChars(Ty),
+ CharUnits::fromQuantity(
+ getContext().getTargetInfo().getTriple().isArch64Bit() ? 8 : 4),
+ /*AllowHigherAlign=*/true, Slot);
}
std::unique_ptr<TargetCodeGenInfo>
|
|
I believe this change is incomplete, consider #![feature(c_variadic)]
#[unsafe(no_mangle)]
unsafe extern "C" fn foo(x: ...) -> u32 {
let mut x = core::hint::black_box(x);
let _ = unsafe { x.arg::<u32>() };
unsafe { x.arg::<u32>() }
}
#[unsafe(no_mangle)]
unsafe extern "C" fn bar() -> u32 {
foo(123u32, 456u32)
}This currently produces .type bar,@function
bar:
.functype bar () -> (i32)
.local i64, i32
global.get __stack_pointer
i64.const 16
i64.sub
local.tee 0
global.set __stack_pointer
local.get 0
i64.const 1958505087099
i64.store 0
local.get 0
call foo
local.set 1
local.get 0
i64.const 16
i64.add
global.set __stack_pointer
local.get 1
end_functionwhere I believe that the slot size is mostly relevant for aligning the values in the variable arguments list. That might be less important for wasm, although the alignment of 64-bit integers is 8 on the vast majority of host targets that might run a wasm interpreter. |
a71f47c to
41760ac
Compare
41760ac to
804cd33
Compare
| // CHECK-NEXT: entry: | ||
| // CHECK-NEXT: [[FMT_ADDR:%.*]] = alloca ptr, align 8 | ||
| // CHECK-NEXT: [[VA:%.*]] = alloca ptr, align 8 | ||
| // CHECK-NEXT: [[V:%.*]] = alloca i32, align 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I unfortunately know very little about LLVM IR, but I expected this line to change to align 8, which it did not.
I know the fix definitely works because I simply used Clang to confirm it on a minimal example. But I expected this test to have different input after the change, which it did not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fine, it's saying that there is some i32 value on the stack, and that stack allocation has an alignment of 4 (which is appropriate for a 32-bit integer).
What you're aiming to change here is the alignment in the variable argument list, which is not observable when reading just a single variadic argument.
| // CHECK-NEXT: entry: | ||
| // CHECK-NEXT: [[FMT_ADDR:%.*]] = alloca ptr, align 8 | ||
| // CHECK-NEXT: [[VA:%.*]] = alloca ptr, align 8 | ||
| // CHECK-NEXT: [[V:%.*]] = alloca i32, align 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fine, it's saying that there is some i32 value on the stack, and that stack allocation has an alignment of 4 (which is appropriate for a 32-bit integer).
What you're aiming to change here is the alignment in the variable argument list, which is not observable when reading just a single variadic argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should also test the calling side, not just the definition (maybe this is done elsewhere already?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I can tell.
I've tried simply adding a calling function but that doesn't work either:
__attribute__((import_module("env"), import_name("test_slot")))
extern void test_slot(unsigned int foo, ...);
void test_u32(void) {
test_slot(123u, 456u, 789u);
}generates the following input:
entry:
call void (i32, ...) @test_slot(i32 noundef 123, i32 noundef 456, i32 noundef 789)
ret voidAFAICT we can't properly test it via LLVM IR. The way I tested it and confirmed it works is by compiling the same code via Clang and then using wasm-tools parse -t to look at the WAT output:
(func $test_u32 (;2;) (type 1)
(local i64)
global.get $__stack_pointer
i64.const 16
i64.sub
local.set 0
local.get 0
global.set $__stack_pointer
local.get 0
i32.const 789
i32.store offset=8
local.get 0
i32.const 456
i32.store
i32.const 123
local.get 0
call $test_slot
local.get 0
i64.const 16
i64.add
global.set $__stack_pointer
return
)With this patch the offset is correctly 8. Just to make sure I tested it without the patch which produces an offset of 4.
If you have any idea how I can test this with the existing LLVM infrastructure please let me know!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that if you make a .ll test file, the CHECKs apply to the generated assembly. at least that is how the other backends seem to work.
|
Based on #175102 (comment), the title is confusing here. What is being adjusted is the slot size on wasm64 (from 4 to 8) to match other 64-bit architectures. |
Looking at #173580 revealed that our testing of varargs is inadequate. This is a start on improving it.
…102) Looking at llvm/llvm-project#173580 revealed that our testing of varargs is inadequate. This is a start on improving it.
|
Thanks for the clarification. So we are changing the minimum size (and alignment with it) of the slots in the vararg buffer, meaning that the slot will always be at least 8 bytes. Previously it was always at least 4, but could be e.g. 8 for i64/f64. In wasm64 there would of course be many more i64s in practice. But, I'm not really understanding where the bug is, or what's causing a problem with Rust. Or, for that matter, what the advantage would be to a larger slot size. It does mean that those now-more-prevalent pointer-sized values will always be aligned on the stack (assuming we are also ensuring that the buffer itself is aligned), allowing aligned loads and stores at the cost of a larger buffer. Is that it? You mentioned consistency with other architectures, but most architectures that I'm aware of (at least, machine architectures) try to use the same calling convention for vararg as for regular calls, so wasm is a bit weird there. |
|
To me, there is no actual issue here: it is purely a question of whether a slot size of 4 was a deliberate decision, it very much looks like it wasn't. A slot size of 4 is perfectly functional, however the 64-bit architectures that I've seen all use a slot size of 8, presumably as you say so that 64-bit values are aligned. So wasm64 can pick a slot size of either 4 or 8, I don't really have strong opinions on that, so long as it is deliberate. If a slot size of 4 is kept I think that's at least worthy of a comment. |
|
OK, fair enough. I just wanted to make sure I understood the situation since the comments at rust-lang/rust#150094 (comment) seemed to suggest there was a bug in LLVM. |
|
It looks like a bug on first glance, and there probably have been many similar issues in the past where someone forgot to update a 4 to an 8 or an In this particular case using a slot size of 4 can actually work, so perhaps this PR was over-eager. From the rust side I think all we'd like to see is a deliberate choice. |
|
OK, I looked at this a little more, and I think I'm understanding the situation better. It sounds like you are considering that each vararg is passed in its own pointer-sized slot in the buffer. But I think the "correct" way to think about it (at least, the way I thought about it when I wrote it originally) is that each item is placed in the buffer at the next available offset according to its ABI size and alignment (the latter of which can be overridden explicitly). But, given that even in the wasm32 ABI, i64s are not exactly rare (and already have the effect of taking a "double slot"), I don't think it really makes sense to think of everything in terms of fixed pointer-sized slots. That would suggest that the wasm64 ABI should continue to be framed in terms of the legalized type of each argument, rather than always rounding up to a consistent size. I just don't really see an advantage to always rounding up, given that it will take more stack size, and won't buy us full consistency of slot size (and each item will remain at least naturally aligned in any case). Having said all that, this did reveal some oddness (possibly a bug) with how vectors and fp128 (or presumably anything that's broken up and passed in multiple arguments) are passed. I need to look at that more closely. It's relevant to the details of your change to WebAssemblyISelLowering, because Flags.getNonZeroOrigAlign() returns the aligment of the overall vector or original type for the first argument, and 1 for the rest; whereas your substitute of the pointer alignment ends up being the minimum slot size. WIth the current lowering of separate slots for each element, it's not necessary to over-align the first element (and I'm not sure that's actually working on the va_arg side), I will have to check on that. |
Looking at llvm/llvm-project#173580 revealed that our testing of varargs is inadequate. This is a start on improving it. (cherry picked from commit 4c61843)
Vararg on Wasm64 was using 4 bytes for the pointer instead of 8 bytes.
I checked the C ABI in the Wasm tool conventions just to make sure there isn't anything strange going on there, but there wasn't. Would still appreciate somebody double-checking this.