[VPlan] Fix LastActiveLane assertion on scalar VF by lukel97 · Pull Request #167897 · llvm/llvm-project

lukel97 · 2025-11-13T16:10:57Z

For a scalar only VPlan with tail folding, if it has a phi live out then legalizeAndOptimizeInductions will scalarize the widened canonical IV feeding into the header mask:

<x1> vector loop: {
  vector.body:
    EMIT vp<%4> = CANONICAL-INDUCTION ir<0>, vp<%index.next>
    vp<%5> = SCALAR-STEPS vp<%4>, ir<1>, vp<%0>
    EMIT vp<%6> = icmp ule vp<%5>, vp<%3>
    EMIT vp<%index.next> = add nuw vp<%4>, vp<%1>
    EMIT branch-on-count vp<%index.next>, vp<%2>
  No successors
}
Successor(s): middle.block

middle.block:
  EMIT vp<%8> = last-active-lane vp<%6>
  EMIT vp<%9> = extract-lane vp<%8>, vp<%5>
Successor(s): ir-bb<exit>

The verifier complains about this but this should still generate the correct last active lane, so this fixes the assert by handling this case in isHeaderMask. There is a similar pattern already there for ActiveLaneMask, which also expects a VPScalarIVSteps recipe.

Fixes #167813

For a scalar only VPlan with tail folding, if it has a phi live out then legalizeAndOptimizeInductions will scalarize the widened canonical IV feeding into the header mask: <x1> vector loop: { vector.body: EMIT vp<%4> = CANONICAL-INDUCTION ir<0>, vp<%index.next> vp<%5> = SCALAR-STEPS vp<%4>, ir<1>, vp<%0> EMIT vp<%6> = icmp ule vp<%5>, vp<%3> EMIT vp<%index.next> = add nuw vp<%4>, vp<%1> EMIT branch-on-count vp<%index.next>, vp<%2> No successors } Successor(s): middle.block middle.block: EMIT vp<%8> = last-active-lane vp<%6> EMIT vp<%9> = extract-lane vp<%8>, vp<%5> Successor(s): ir-bb<exit> The verifier complains about this but this should still generate the correct last active lane, so this fixes the assert by handling this case in isHeaderMask. There is a similar pattern already there for ActiveLaneMask, which also expects a VPScalarIVSteps recipe. Fixes llvm#167813

llvmbot · 2025-11-13T16:11:36Z

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Luke Lau (lukel97)

Changes

For a scalar only VPlan with tail folding, if it has a phi live out then legalizeAndOptimizeInductions will scalarize the widened canonical IV feeding into the header mask:

&lt;x1&gt; vector loop: {
  vector.body:
    EMIT vp&lt;%4&gt; = CANONICAL-INDUCTION ir&lt;0&gt;, vp&lt;%index.next&gt;
    vp&lt;%5&gt; = SCALAR-STEPS vp&lt;%4&gt;, ir&lt;1&gt;, vp&lt;%0&gt;
    EMIT vp&lt;%6&gt; = icmp ule vp&lt;%5&gt;, vp&lt;%3&gt;
    EMIT vp&lt;%index.next&gt; = add nuw vp&lt;%4&gt;, vp&lt;%1&gt;
    EMIT branch-on-count vp&lt;%index.next&gt;, vp&lt;%2&gt;
  No successors
}
Successor(s): middle.block

middle.block:
  EMIT vp&lt;%8&gt; = last-active-lane vp&lt;%6&gt;
  EMIT vp&lt;%9&gt; = extract-lane vp&lt;%8&gt;, vp&lt;%5&gt;
Successor(s): ir-bb&lt;exit&gt;

The verifier complains about this but this should still generate the correct last active lane, so this fixes the assert by handling this case in isHeaderMask. There is a similar pattern already there for ActiveLaneMask, which also expects a VPScalarIVSteps recipe.

Fixes #167813

Full diff: https://github.com/llvm/llvm-project/pull/167897.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlanUtils.cpp (+7)
(added) llvm/test/Transforms/LoopVectorize/tail-folding-live-out-scalar-vf.ll (+60)

diff --git a/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp b/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
index e22c5dfdb9f38..c9de9b82bca7c 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
@@ -66,6 +66,13 @@ bool vputils::isHeaderMask(const VPValue *V, const VPlan &Plan) {
                       m_One(), m_Specific(&Plan.getVF()))) ||
             IsWideCanonicalIV(A));
 
+  if (match(V,
+            m_ICmp(m_ScalarIVSteps(
+                       m_Specific(Plan.getVectorLoopRegion()->getCanonicalIV()),
+                       m_One(), m_Specific(&Plan.getVF())),
+                   m_Specific(Plan.getBackedgeTakenCount()))))
+    return true;
+
   return match(V, m_ICmp(m_VPValue(A), m_VPValue(B))) && IsWideCanonicalIV(A) &&
          B == Plan.getBackedgeTakenCount();
 }
diff --git a/llvm/test/Transforms/LoopVectorize/tail-folding-live-out-scalar-vf.ll b/llvm/test/Transforms/LoopVectorize/tail-folding-live-out-scalar-vf.ll
new file mode 100644
index 0000000000000..5964cf45fb6be
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/tail-folding-live-out-scalar-vf.ll
@@ -0,0 +1,60 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 6
+; RUN: opt -p loop-vectorize -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue -force-vector-width=1 -force-vector-interleave=2 -S %s | FileCheck %s
+
+define i64 @live_out_scalar_vf(i64 %n) {
+; CHECK-LABEL: define i64 @live_out_scalar_vf(
+; CHECK-SAME: i64 [[N:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = add i64 [[N]], 1
+; CHECK-NEXT:    br label %[[VECTOR_PH:.*]]
+; CHECK:       [[VECTOR_PH]]:
+; CHECK-NEXT:    [[N_RND_UP:%.*]] = add i64 [[TMP0]], 1
+; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 2
+; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
+; CHECK-NEXT:    [[TRIP_COUNT_MINUS_1:%.*]] = sub i64 [[TMP0]], 1
+; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK:       [[VECTOR_BODY]]:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[TMP1:%.*]] = add i64 [[INDEX]], 0
+; CHECK-NEXT:    [[TMP2:%.*]] = add i64 [[INDEX]], 1
+; CHECK-NEXT:    [[TMP3:%.*]] = icmp ugt i64 [[TMP1]], [[TRIP_COUNT_MINUS_1]]
+; CHECK-NEXT:    [[TMP4:%.*]] = icmp ugt i64 [[TMP2]], [[TRIP_COUNT_MINUS_1]]
+; CHECK-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], 2
+; CHECK-NEXT:    [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT:    br i1 [[TMP5]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       [[MIDDLE_BLOCK]]:
+; CHECK-NEXT:    [[TMP6:%.*]] = icmp eq i1 [[TMP4]], false
+; CHECK-NEXT:    [[TMP7:%.*]] = zext i1 [[TMP6]] to i64
+; CHECK-NEXT:    [[TMP8:%.*]] = add i64 1, [[TMP7]]
+; CHECK-NEXT:    [[TMP9:%.*]] = icmp eq i1 [[TMP3]], false
+; CHECK-NEXT:    [[TMP10:%.*]] = zext i1 [[TMP9]] to i64
+; CHECK-NEXT:    [[TMP11:%.*]] = add i64 0, [[TMP10]]
+; CHECK-NEXT:    [[TMP12:%.*]] = icmp ne i64 [[TMP10]], 1
+; CHECK-NEXT:    [[TMP13:%.*]] = select i1 [[TMP12]], i64 [[TMP11]], i64 [[TMP8]]
+; CHECK-NEXT:    [[LAST_ACTIVE_LANE:%.*]] = sub i64 [[TMP13]], 1
+; CHECK-NEXT:    [[TMP14:%.*]] = sub i64 [[LAST_ACTIVE_LANE]], 1
+; CHECK-NEXT:    [[TMP15:%.*]] = icmp uge i64 [[LAST_ACTIVE_LANE]], 1
+; CHECK-NEXT:    [[TMP16:%.*]] = select i1 [[TMP15]], i64 [[TMP2]], i64 [[TMP1]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret i64 [[TMP16]]
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]
+  br label %latch
+
+latch:
+  ; Need to use a phi otherwise the header mask will use a
+  ; VPWidenCanonicalIVRecipe instead of a VPScalarIVStepsRecipe.
+  %exitval = phi i64 [ %iv, %loop ]
+  %iv.next = add i64 %iv, 1
+  %ec = icmp eq i64 %iv, %n
+  br i1 %ec, label %exit, label %loop
+
+exit:
+  ret i64 %exitval
+}
+

fhahn · 2025-11-13T16:33:13Z

+
+loop:
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]
+  br label %latch


Can we turn this into a non-trivial branch?

I was able to remove the branch entirely in 3863112, and the crash is still reproducable

fhahn · 2025-11-13T18:41:09Z

                      m_One(), m_Specific(&Plan.getVF()))) ||
            IsWideCanonicalIV(A));

+  if (match(V,


Suggested change

if (match(V,

// For scalar plans, the header mask uses the scalar steps.

if (match(V,

Done in 23bce03

…ze/fix-tail-folded-live-out-scalar-vf

lukel97 · 2025-11-14T07:00:53Z

I see that #149042 was reverted but I've updated this on main if you'd like to land this separately anyway.

david-arm · 2025-11-14T15:28:59Z


+  // For scalar plans, the header mask uses the scalar steps.
+  if (match(V,
+            m_ICmp(m_ScalarIVSteps(


Looks like this pattern is now used twice in this function, i.e.

m_ScalarIVSteps(m_Specific(Plan.getVectorLoopRegion()->getCanonicalIV()), m_One(), m_Specific(&Plan.getVF()))

Is it worth creating a specific pattern match for this? Something like

m_CanonicalScalarIVSteps(Plan)

?

Good point, I've added a helper in efca5e5

…ze/fix-tail-folded-live-out-scalar-vf

arcbbb

Looks good to me. Thanks for the fix!

fhahn · 2025-11-17T08:47:33Z

  return Expanded;
 }

+static inline auto m_CanonicalScalarIVSteps(const VPlan &Plan) {


If it’s only used in a single function, you can re-use by assigning the pattern to a variable I think?

If it stays a static function here, you can drop the inline I think.

Yup, turned it into a variable in c0a7067

fhahn

LGTM, thanks

fhahn · 2025-11-17T09:50:53Z

+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 6
+; RUN: opt -p loop-vectorize -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue -force-vector-width=1 -force-vector-interleave=2 -S %s | FileCheck %s
+
+define i64 @live_out_scalar_vf(i64 %n) {


if possible, might be good to add this to llvm/test/Transforms/LoopVectorize/tail-folding-vectorization-factor-1.ll, which already contains a set of test cases for interleaving only with tail-folding.

Done in c29428f

Needed to add -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue flag to run lines. Looks like the existing tests weren't necessarily tail folded?

…folding. (#149042)" This reverts commit a6edeed. The following fixes have landed, addressing issues causing the original revert: * #169298 * #167897 * #168949 Original message: Building on top of #148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also #148603 PR: #149042

… when tail-folding. (#149042)" This reverts commit a6edeed. The following fixes have landed, addressing issues causing the original revert: * llvm/llvm-project#169298 * llvm/llvm-project#167897 * llvm/llvm-project#168949 Original message: Building on top of llvm/llvm-project#148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also llvm/llvm-project#148603 PR: llvm/llvm-project#149042

…folding. (#149042)" This reverts commit a6edeed. The following fixes have landed, addressing issues causing the original revert: * #169298 * #167897 * #168949 Original message: Building on top of #148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also #148603 PR: #149042

… when tail-folding. (#149042)" This reverts commit a6edeed. The following fixes have landed, addressing issues causing the original revert: * llvm/llvm-project#169298 * llvm/llvm-project#167897 * llvm/llvm-project#168949 Original message: Building on top of llvm/llvm-project#148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also llvm/llvm-project#148603 PR: llvm/llvm-project#149042

…folding. (llvm#149042)" This reverts commit a6edeed. The following fixes have landed, addressing issues causing the original revert: * llvm#169298 * llvm#167897 * llvm#168949 Original message: Building on top of llvm#148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also llvm#148603 PR: llvm#149042

lukel97 requested review from arcbbb, david-arm and fhahn November 13, 2025 16:10

llvmbot added vectorizers llvm:transforms labels Nov 13, 2025

fhahn reviewed Nov 13, 2025

View reviewed changes

Remove trivial branch

3863112

fhahn reviewed Nov 13, 2025

View reviewed changes

lukel97 added 3 commits November 14, 2025 14:48

Add comment and assert

23bce03

Merge branch 'main' of github.com:llvm/llvm-project into loop-vectori…

81b7721

…ze/fix-tail-folded-live-out-scalar-vf

Update test after merging

67bb402

lukel97 mentioned this pull request Nov 14, 2025

[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. #149042

Merged

david-arm reviewed Nov 14, 2025

View reviewed changes

lukel97 added 2 commits November 17, 2025 11:47

Merge branch 'main' of github.com:llvm/llvm-project into loop-vectori…

602ca13

…ze/fix-tail-folded-live-out-scalar-vf

Add m_CanonicalScalarIVSteps helper

efca5e5

arcbbb reviewed Nov 17, 2025

View reviewed changes

fhahn reviewed Nov 17, 2025

View reviewed changes

Move to variable

c0a7067

fhahn approved these changes Nov 17, 2025

View reviewed changes

Move test to tail-folding-vectorization-factor-1.ll

c29428f

Needed to add -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue flag to run lines. Looks like the existing tests weren't necessarily tail folded?

lukel97 enabled auto-merge (squash) November 17, 2025 10:34

lukel97 merged commit 4d4a60c into llvm:main Nov 17, 2025
9 of 10 checks passed

lukel97 mentioned this pull request Nov 19, 2025

Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)" #168738

Closed

	if (match(V,
	// For scalar plans, the header mask uses the scalar steps.
	if (match(V,

Conversation

lukel97 commented Nov 13, 2025

Uh oh!

llvmbot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lukel97 commented Nov 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arcbbb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

llvmbot commented Nov 13, 2025 •

edited

Loading