Conversation
CarolEidt
commented
Apr 27, 2020
- Allow expanding SIMD12 field to 16 bytes if it is the only field.
- Similarly, if we have a struct with a single field we can reference it as the full size of the struct.
- Eliminate old code for an integer zero RHS with a SIMD12 LHS, and unify the codegen for zero-init for SIMD12 (between block init and assignment of SIMD init of 0)
src/coreclr/src/jit/gentree.h
Outdated
There was a problem hiding this comment.
Is there any need/desire to recognize SIMDIntrinsicGetZero or the equivalent HWIntrinsic kinds?
There was a problem hiding this comment.
SIMDIntrinsicGetZero becomes SIMDIntrinsicInit so that's covered, but we may want to consider the HWIntrinsic kinds if/when we unify their handling in codegen.
|
@dotnet/jit-contrib PTAL - this fixes some code size regressions I encountered in promoting multireg structs that later didn't get enregistered. |
|
The asm diffs from a superpmi run of the tests & frameworks: |
src/coreclr/src/jit/compiler.h
Outdated
There was a problem hiding this comment.
can we check that parent exact size is 16? To protect from cases like:
[StructLayout(LayoutKind.Explicit, Pack = 1)]
public struct A {
[StructLayout(LayoutKind.Explicit, Pack = 1)]
public struct B {
SIMD12 c; // can't be written as 16 bytes.
}
}
Right now we won't generate a LCL_FLD node to access c, but it could change soon.
There was a problem hiding this comment.
Should that be SIMD12 c;?
There was a problem hiding this comment.
Have you verified that a struct such as that will occupy only 12 bytes on the stack? I would be surprised, as I would expect it to always get 16. In any case, it is an inexpensive check to add.
There was a problem hiding this comment.
Actually, I take that back - lvExactSize will be 12 so we will miss the optimization in all cases. But if we are actually packing such a type, e.g. on a 32-bit architecture, lvSize() will return 12. So I believe that the check is correct.
There was a problem hiding this comment.
I'm pretty sure that we rely in many places on lvSize() returning the size that the variable is allocated on the stack.
There was a problem hiding this comment.
If lvSize() == 16 means that it takes 16 bytes on the stack, doesn't it mean that if we map it to SIMD16 it won't overlap with any parent fields that it was not overlapping before?
If the parent struct was promoted it means there were no overlapping fields, so it is not clear for me why we need to check anything except if (varDsc->lvSize() == 16)
There was a problem hiding this comment.
The reason we need to make the other checks is that if it has multiple fields, it is possible (in particular on 32-bit targets) that it would have a 32-bit field following it. This could even happen on 64-bit targets with explicit layout.
There was a problem hiding this comment.
it is possible (in particular on 32-bit targets) that it would have a 32-bit field following it.
do I understand that correctly:
struct C { // size 16 on x86?
SIMD12 a; // offset 0, lvSize() == 16?, lvExactSize == 12
int b; // offset 12? , size == 4
}
how do we allocate the second field starting at 12 after we passed 16 to lvaAllocLocalAndSetVirtualOffset(lclNum, lvaLclSize(lclNum), stkOffs);
There was a problem hiding this comment.
Sorry I missed this comment earlier (I'm still struggling to get green CI testing on this!). We special-case locals of SIMD12 type on 32-bit targets so that they occupy a full 16 bytes, where normally it would only be 12, since that's an even multiple of TARGET_POINTER_SIZE. However, if it is a member of another struct we can't do that (it would violate the struct packing rules). So on a 32-bit target the size of C.a would be 12.
There was a problem hiding this comment.
I've backed out this change - it's not clear from the comment here (and I hadn't remembered) that lvSize() always returns the "fake" size of TYP_SIMD12, even if it may later become a "dependently promoted" local. This was causing the RayTracer benchmark to fail, because the local is 12 bytes but we were storing 16.
cb4ad9a to
2f2896a
Compare
- Allow expanding SIMD12 field to 16 bytes if it is the only field. - Similarly, if we have a struct with a single field we can reference it as the full size of the struct. - Eliminate old code for an integer zero RHS with a SIMD12 LHS, and unify the codegen for zero-init for SIMD12 (between block init and assignment of SIMD init of 0)
ae77566 to
2c93671
Compare