[AArch64] All bits of an exact right shift are demanded#97448
[AArch64] All bits of an exact right shift are demanded#97448momchil-velikov merged 2 commits intollvm:mainfrom
Conversation
When building a vector which contains zero elements, the AArch64 ISel replaces those elements with `undef`, if they are right shifted out. However, these elements need to stay zero if the right shift is exact, or otherwise we will be introducing undefined behavior. Change-Id: I8d7eb1964ebbe88a90568be7805a29a72edd89e1
|
@llvm/pr-subscribers-backend-aarch64 Author: Momchil Velikov (momchil-velikov) ChangesWhen building a vector which contains zero elements, the AArch64 ISel replaces those elements with However, these elements need to stay zero if the right shift is exact, or otherwise we will be introducing undefined behavior. Should allow #92528 to be recommitted. Full diff: https://github.com/llvm/llvm-project/pull/97448.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index e0c3cc5eddb82..c7b2a21e6ed58 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -22142,6 +22142,10 @@ static SDValue performVectorShiftCombine(SDNode *N,
if (DCI.DAG.ComputeNumSignBits(Op.getOperand(0)) > ShiftImm)
return Op.getOperand(0);
+ // If the right shift is exact, the shifted out bits matter.
+ if (N->getOpcode() == AArch64ISD::VASHR && N->getFlags().hasExact())
+ return SDValue();
+
APInt ShiftedOutBits = APInt::getLowBitsSet(OpScalarSize, ShiftImm);
APInt DemandedMask = ~ShiftedOutBits;
diff --git a/llvm/test/CodeGen/AArch64/ashr-exact-demanded-bits.ll b/llvm/test/CodeGen/AArch64/ashr-exact-demanded-bits.ll
new file mode 100644
index 0000000000000..9f877dffe1ab5
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/ashr-exact-demanded-bits.ll
@@ -0,0 +1,20 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s | FileCheck %s
+target triple = "aarch64-linux"
+
+define <2 x float> @f(i8 %0, i8 %1) {
+; CHECK-LABEL: f:
+; CHECK: // %bb.0:
+; CHECK-NEXT: movi v0.2d, #0000000000000000
+; CHECK-NEXT: mov v0.b[3], w0
+; CHECK-NEXT: mov v0.b[7], w1
+; CHECK-NEXT: scvtf v0.2s, v0.2s, #24
+; CHECK-NEXT: ret
+ %3 = insertelement <2 x i8> poison, i8 %0, i64 0
+ %4 = insertelement <2 x i8> %3, i8 %1, i64 1
+ %5 = shufflevector <2 x i8> %4, <2 x i8> <i8 0, i8 poison>, <8 x i32> <i32 2, i32 2, i32 2, i32 0, i32 2, i32 2, i32 2, i32 1>
+ %6 = bitcast <8 x i8> %5 to <2 x i32>
+ %7 = ashr exact <2 x i32> %6, <i32 24, i32 24>
+ %8 = sitofp <2 x i32> %7 to <2 x float>
+ ret <2 x float> %8
+}
|
davemgreen
left a comment
There was a problem hiding this comment.
I see.. because we need the bits to be 0's. This sounds good, but should we also apply this to VLSHR too? The code for SRA looks like this (but we are not using the input DemandBits here), and SRL has the hasExact part.
// If the shift is exact, then it does demand the low bits (and knows that
// they are zero).
if (Op->getFlags().hasExact())
InDemandedMask.setLowBits(ShAmt);
// If any of the demanded bits are produced by the sign extension, we also
// demand the input sign bit.
if (DemandedBits.countl_zero() < ShAmt)
InDemandedMask.setSignBit();
Change-Id: I5ac39bf06434cb4029ec8482bd54b2f756bb6af8
| if (DCI.DAG.ComputeNumSignBits(Op.getOperand(0)) > ShiftImm) | ||
| return Op.getOperand(0); | ||
|
|
||
| // If the right shift is exact, the shifted out bits matter. |
There was a problem hiding this comment.
nit: perhaps elaborate a bit on why the shifted out bits matter. E.g. "because we may fold the right shift with other instructions and thus require the lower bits to be set."
Now that gets me thinking. Wouldn't this condition than make code that does not fold the right shift with other instructions less efficient? Shouldn't the onus be on wherever folds right shift with sitofp (scvtf) to make sure to clear the lower bits?
There was a problem hiding this comment.
(Somehow didn't send this out before -- posting for posterity.)
There was a problem hiding this comment.
I wrote that in the commit message
However, these elements need to stay zero if the right shift is exact, or otherwise we will be introducing undefined behavior.
When building a vector which contains zero elements, the AArch64 ISel replaces those elements with
undef, if they are right shifted out.However, these elements need to stay zero if the right shift is exact, or otherwise we will be introducing undefined behavior.
Should allow #92528 to be recommitted.