[VPlan] Fix LastActiveLane assertion on scalar VF#167897
Conversation
For a scalar only VPlan with tail folding, if it has a phi live out then legalizeAndOptimizeInductions will scalarize the widened canonical IV feeding into the header mask:
<x1> vector loop: {
vector.body:
EMIT vp<%4> = CANONICAL-INDUCTION ir<0>, vp<%index.next>
vp<%5> = SCALAR-STEPS vp<%4>, ir<1>, vp<%0>
EMIT vp<%6> = icmp ule vp<%5>, vp<%3>
EMIT vp<%index.next> = add nuw vp<%4>, vp<%1>
EMIT branch-on-count vp<%index.next>, vp<%2>
No successors
}
Successor(s): middle.block
middle.block:
EMIT vp<%8> = last-active-lane vp<%6>
EMIT vp<%9> = extract-lane vp<%8>, vp<%5>
Successor(s): ir-bb<exit>
The verifier complains about this but this should still generate the correct last active lane, so this fixes the assert by handling this case in isHeaderMask. There is a similar pattern already there for ActiveLaneMask, which also expects a VPScalarIVSteps recipe.
Fixes llvm#167813
|
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Luke Lau (lukel97) ChangesFor a scalar only VPlan with tail folding, if it has a phi live out then legalizeAndOptimizeInductions will scalarize the widened canonical IV feeding into the header mask: The verifier complains about this but this should still generate the correct last active lane, so this fixes the assert by handling this case in isHeaderMask. There is a similar pattern already there for ActiveLaneMask, which also expects a VPScalarIVSteps recipe. Fixes #167813 Full diff: https://github.com/llvm/llvm-project/pull/167897.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp b/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
index e22c5dfdb9f38..c9de9b82bca7c 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
@@ -66,6 +66,13 @@ bool vputils::isHeaderMask(const VPValue *V, const VPlan &Plan) {
m_One(), m_Specific(&Plan.getVF()))) ||
IsWideCanonicalIV(A));
+ if (match(V,
+ m_ICmp(m_ScalarIVSteps(
+ m_Specific(Plan.getVectorLoopRegion()->getCanonicalIV()),
+ m_One(), m_Specific(&Plan.getVF())),
+ m_Specific(Plan.getBackedgeTakenCount()))))
+ return true;
+
return match(V, m_ICmp(m_VPValue(A), m_VPValue(B))) && IsWideCanonicalIV(A) &&
B == Plan.getBackedgeTakenCount();
}
diff --git a/llvm/test/Transforms/LoopVectorize/tail-folding-live-out-scalar-vf.ll b/llvm/test/Transforms/LoopVectorize/tail-folding-live-out-scalar-vf.ll
new file mode 100644
index 0000000000000..5964cf45fb6be
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/tail-folding-live-out-scalar-vf.ll
@@ -0,0 +1,60 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 6
+; RUN: opt -p loop-vectorize -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue -force-vector-width=1 -force-vector-interleave=2 -S %s | FileCheck %s
+
+define i64 @live_out_scalar_vf(i64 %n) {
+; CHECK-LABEL: define i64 @live_out_scalar_vf(
+; CHECK-SAME: i64 [[N:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[N]], 1
+; CHECK-NEXT: br label %[[VECTOR_PH:.*]]
+; CHECK: [[VECTOR_PH]]:
+; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[TMP0]], 1
+; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 2
+; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
+; CHECK-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = sub i64 [[TMP0]], 1
+; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
+; CHECK: [[VECTOR_BODY]]:
+; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 0
+; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 1
+; CHECK-NEXT: [[TMP3:%.*]] = icmp ugt i64 [[TMP1]], [[TRIP_COUNT_MINUS_1]]
+; CHECK-NEXT: [[TMP4:%.*]] = icmp ugt i64 [[TMP2]], [[TRIP_COUNT_MINUS_1]]
+; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
+; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT: br i1 [[TMP5]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK: [[MIDDLE_BLOCK]]:
+; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i1 [[TMP4]], false
+; CHECK-NEXT: [[TMP7:%.*]] = zext i1 [[TMP6]] to i64
+; CHECK-NEXT: [[TMP8:%.*]] = add i64 1, [[TMP7]]
+; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i1 [[TMP3]], false
+; CHECK-NEXT: [[TMP10:%.*]] = zext i1 [[TMP9]] to i64
+; CHECK-NEXT: [[TMP11:%.*]] = add i64 0, [[TMP10]]
+; CHECK-NEXT: [[TMP12:%.*]] = icmp ne i64 [[TMP10]], 1
+; CHECK-NEXT: [[TMP13:%.*]] = select i1 [[TMP12]], i64 [[TMP11]], i64 [[TMP8]]
+; CHECK-NEXT: [[LAST_ACTIVE_LANE:%.*]] = sub i64 [[TMP13]], 1
+; CHECK-NEXT: [[TMP14:%.*]] = sub i64 [[LAST_ACTIVE_LANE]], 1
+; CHECK-NEXT: [[TMP15:%.*]] = icmp uge i64 [[LAST_ACTIVE_LANE]], 1
+; CHECK-NEXT: [[TMP16:%.*]] = select i1 [[TMP15]], i64 [[TMP2]], i64 [[TMP1]]
+; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK: [[EXIT]]:
+; CHECK-NEXT: ret i64 [[TMP16]]
+;
+entry:
+ br label %loop
+
+loop:
+ %iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]
+ br label %latch
+
+latch:
+ ; Need to use a phi otherwise the header mask will use a
+ ; VPWidenCanonicalIVRecipe instead of a VPScalarIVStepsRecipe.
+ %exitval = phi i64 [ %iv, %loop ]
+ %iv.next = add i64 %iv, 1
+ %ec = icmp eq i64 %iv, %n
+ br i1 %ec, label %exit, label %loop
+
+exit:
+ ret i64 %exitval
+}
+
|
|
|
||
| loop: | ||
| %iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ] | ||
| br label %latch |
There was a problem hiding this comment.
Can we turn this into a non-trivial branch?
There was a problem hiding this comment.
I was able to remove the branch entirely in 3863112, and the crash is still reproducable
| m_One(), m_Specific(&Plan.getVF()))) || | ||
| IsWideCanonicalIV(A)); | ||
|
|
||
| if (match(V, |
There was a problem hiding this comment.
| if (match(V, | |
| // For scalar plans, the header mask uses the scalar steps. | |
| if (match(V, |
…ze/fix-tail-folded-live-out-scalar-vf
|
I see that #149042 was reverted but I've updated this on main if you'd like to land this separately anyway. |
|
|
||
| // For scalar plans, the header mask uses the scalar steps. | ||
| if (match(V, | ||
| m_ICmp(m_ScalarIVSteps( |
There was a problem hiding this comment.
Looks like this pattern is now used twice in this function, i.e.
m_ScalarIVSteps(m_Specific(Plan.getVectorLoopRegion()->getCanonicalIV()),
m_One(), m_Specific(&Plan.getVF()))
Is it worth creating a specific pattern match for this? Something like
m_CanonicalScalarIVSteps(Plan)
?
There was a problem hiding this comment.
Good point, I've added a helper in efca5e5
…ze/fix-tail-folded-live-out-scalar-vf
arcbbb
left a comment
There was a problem hiding this comment.
Looks good to me. Thanks for the fix!
| return Expanded; | ||
| } | ||
|
|
||
| static inline auto m_CanonicalScalarIVSteps(const VPlan &Plan) { |
There was a problem hiding this comment.
If it’s only used in a single function, you can re-use by assigning the pattern to a variable I think?
There was a problem hiding this comment.
If it stays a static function here, you can drop the inline I think.
There was a problem hiding this comment.
Yup, turned it into a variable in c0a7067
| ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 6 | ||
| ; RUN: opt -p loop-vectorize -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue -force-vector-width=1 -force-vector-interleave=2 -S %s | FileCheck %s | ||
|
|
||
| define i64 @live_out_scalar_vf(i64 %n) { |
There was a problem hiding this comment.
if possible, might be good to add this to llvm/test/Transforms/LoopVectorize/tail-folding-vectorization-factor-1.ll, which already contains a set of test cases for interleaving only with tail-folding.
Needed to add -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue flag to run lines. Looks like the existing tests weren't necessarily tail folded?
…folding. (#149042)" This reverts commit a6edeed. The following fixes have landed, addressing issues causing the original revert: * #169298 * #167897 * #168949 Original message: Building on top of #148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also #148603 PR: #149042
… when tail-folding. (#149042)" This reverts commit a6edeed. The following fixes have landed, addressing issues causing the original revert: * llvm/llvm-project#169298 * llvm/llvm-project#167897 * llvm/llvm-project#168949 Original message: Building on top of llvm/llvm-project#148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also llvm/llvm-project#148603 PR: llvm/llvm-project#149042
…folding. (#149042)" This reverts commit a6edeed. The following fixes have landed, addressing issues causing the original revert: * #169298 * #167897 * #168949 Original message: Building on top of #148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also #148603 PR: #149042
… when tail-folding. (#149042)" This reverts commit a6edeed. The following fixes have landed, addressing issues causing the original revert: * llvm/llvm-project#169298 * llvm/llvm-project#167897 * llvm/llvm-project#168949 Original message: Building on top of llvm/llvm-project#148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also llvm/llvm-project#148603 PR: llvm/llvm-project#149042
…folding. (llvm#149042)" This reverts commit a6edeed. The following fixes have landed, addressing issues causing the original revert: * llvm#169298 * llvm#167897 * llvm#168949 Original message: Building on top of llvm#148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also llvm#148603 PR: llvm#149042
…folding. (llvm#149042)" This reverts commit a6edeed. The following fixes have landed, addressing issues causing the original revert: * llvm#169298 * llvm#167897 * llvm#168949 Original message: Building on top of llvm#148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also llvm#148603 PR: llvm#149042
For a scalar only VPlan with tail folding, if it has a phi live out then legalizeAndOptimizeInductions will scalarize the widened canonical IV feeding into the header mask:
The verifier complains about this but this should still generate the correct last active lane, so this fixes the assert by handling this case in isHeaderMask. There is a similar pattern already there for ActiveLaneMask, which also expects a VPScalarIVSteps recipe.
Fixes #167813