[LV] Disable fold tail by masking - when induction vars used outside by niwinanto · Pull Request #81609 · llvm/llvm-project

niwinanto · 2024-02-13T14:45:35Z

When induction variable are used outside the loop body, tail folding
by masking mis-compiles, because for users outside of the loop the
final value of the induction is computed separately from the vector
loop.

Fixes #76069
Fixes #51677

github-actions · 2024-02-13T14:45:52Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2024-02-13T14:46:23Z

@llvm/pr-subscribers-llvm-transforms

Author: Niwin Anto (niwinanto)

Changes

When induction variable are used outside the loop body, tail folding by masking mis-compiles.
#76069

Full diff: https://github.com/llvm/llvm-project/pull/81609.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp (+13)
(added) llvm/test/Transforms/LoopVectorize/no-fold-tail-by-masking-iv-external-uses.ll (+85)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
index 37a356c43e29a4..d33743e74cbe31 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
@@ -1552,6 +1552,19 @@ bool LoopVectorizationLegality::prepareToFoldTailByMasking() {
     }
   }
 
+  for (const auto &Entry : getInductionVars()) {
+    PHINode *OrigPhi = Entry.first;
+    for (User *U : OrigPhi->users()) {
+      auto *UI = cast<Instruction>(U);
+      if (!TheLoop->contains(UI)) {
+        LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking, loop IV has an "
+                             "outside user for "
+                          << *UI << "\n");
+        return false;
+      }
+    }
+  }
+
   // The list of pointers that we can safely read and write to remains empty.
   SmallPtrSet<Value *, 8> SafePointers;
 
diff --git a/llvm/test/Transforms/LoopVectorize/no-fold-tail-by-masking-iv-external-uses.ll b/llvm/test/Transforms/LoopVectorize/no-fold-tail-by-masking-iv-external-uses.ll
new file mode 100644
index 00000000000000..f7379df934bd77
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/no-fold-tail-by-masking-iv-external-uses.ll
@@ -0,0 +1,85 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -passes=loop-vectorize -S | FileCheck %s
+
+
+; #include <stdio.h>
+; #define SIZE 17
+;
+; unsigned char result;
+; unsigned char arr_1[SIZE];
+;
+; __attribute__((__noinline__))
+; void test(int limit, unsigned char val, int arr_2[SIZE][SIZE][SIZE]) {
+;     #pragma clang loop vectorize_predicate(enable)
+;     for (short i_5 = 0; i_5 < limit; i_5++) {
+;         arr_1 [i_5] = val;
+;         result = arr_2[0][0][i_5] != arr_2[i_5][i_5][0];
+;     }
+; }
+;
+;int main(void) {
+;  int arr_2[SIZE][SIZE][SIZE];
+;
+;  __builtin_memset(arr_2, 1, sizeof(arr_2));
+;
+;  test(SIZE, 0, arr_2);
+;  printf("%hu \n", result);
+;}
+; clang miss-compiles the above code
+; with vectorize_predicate(enable), result is 0 and 1 without.
+
+
+@result = global i8 0, align 1
+@arr_17 = global [17 x i8] zeroinitializer, align 1
+@a = external global i8, align 1
+
+define void @test(i32 %limit, i8 zeroext %val, ptr readonly %arr_14)   {
+; CHECK-LABEL: @test(
+; CHECK-NOT:       pred.store.if:
+; CHECK-NOT:       pred.store.continue:
+;
+entry:
+  %cmp18 = icmp sgt i32 %limit, 0
+  br i1 %cmp18, label %for.body.preheader, label %for.cond.cleanup
+
+for.body.preheader:                               ; preds = %entry
+  br label %for.body
+
+for.cond.for.cond.cleanup_crit_edge:              ; preds = %for.body
+  %conv20.lcssa = phi i32 [ %conv20, %for.body ]
+  %arrayidx4 = getelementptr inbounds [17 x i32], ptr %arr_14, i32 0, i32 %conv20.lcssa
+  %0 = load i32, ptr %arrayidx4, align 4, !tbaa !4
+  %arrayidx8 = getelementptr inbounds [17 x [17 x i32]], ptr %arr_14, i32 %conv20.lcssa, i32 %conv20.lcssa
+  %1 = load i32, ptr %arrayidx8, align 4, !tbaa !4
+  %cmp10 = icmp ne i32 %0, %1
+  %conv11 = zext i1 %cmp10 to i8
+  store i8 %conv11, ptr @result, align 1, !tbaa !8
+  br label %for.cond.cleanup
+
+for.cond.cleanup:                                 ; preds = %for.cond.for.cond.cleanup_crit_edge, %entry
+  ret void
+
+for.body:                                         ; preds = %for.body.preheader, %for.body
+  %conv20 = phi i32 [ %conv, %for.body ], [ 0, %for.body.preheader ]
+  %i_5.019 = phi i16 [ %inc, %for.body ], [ 0, %for.body.preheader ]
+  %arrayidx = getelementptr inbounds [17 x i8], ptr @arr_17, i32 0, i32 %conv20
+  store i8 %val, ptr %arrayidx, align 1, !tbaa !8
+  %inc = add i16 %i_5.019, 1
+  %conv = sext i16 %inc to i32
+  %cmp = icmp slt i32 %conv, %limit
+  br i1 %cmp, label %for.body, label %for.cond.for.cond.cleanup_crit_edge, !llvm.loop !9
+}
+
+
+
+!4 = !{!5, !5, i64 0}
+!5 = !{!"int", !6, i64 0}
+!6 = !{!"omnipotent char", !7, i64 0}
+!7 = !{!"Simple C++ TBAA"}
+!8 = !{!6, !6, i64 0}
+!9 = distinct !{!9, !10, !11, !12, !13, !14}
+!10 = !{!"llvm.loop.mustprogress"}
+!11 = !{!"llvm.loop.vectorize.predicate.enable", i1 true}
+!12 = !{!"llvm.loop.vectorize.width", i32 2}
+!13 = !{!"llvm.loop.vectorize.scalable.enable", i1 false}
+!14 = !{!"llvm.loop.vectorize.enable", i1 true}

fhahn

Thanks for the patch!

Could you add the test as a separate PR (with a FIXME); this patch then just adjust the test and the diff shows the change in the test only.

Previously there was a patch shared here https://reviews.llvm.org/D115109 by @rickyz (hope it's the same as on Phabricator) but the patch never got pushed through. Would be good to look at the comments and potentially pick it up

fhahn · 2024-02-13T17:51:52Z

+  br label %for.body
+
+for.cond.for.cond.cleanup_crit_edge:              ; preds = %for.body
+  %conv20.lcssa = phi i32 [ %conv20, %for.body ]


I think the test can be simplified by just returning %conv20.lcssa here

fhahn · 2024-02-13T17:53:16Z

+  ret void
+
+for.body:                                         ; preds = %for.body.preheader, %for.body
+  %conv20 = phi i32 [ %conv, %for.body ], [ 0, %for.body.preheader ]


Does the issue reproduce if all uses of %conv20 are replaced by i_5.019?

fhahn · 2024-02-13T17:53:40Z

+
+for.body:                                         ; preds = %for.body.preheader, %for.body
+  %conv20 = phi i32 [ %conv, %for.body ], [ 0, %for.body.preheader ]
+  %i_5.019 = phi i16 [ %inc, %for.body ], [ 0, %for.body.preheader ]


Can the phi be changed to i32, so the sext in the loop isn't needed?

fhahn · 2024-02-13T17:53:50Z

+  %conv20 = phi i32 [ %conv, %for.body ], [ 0, %for.body.preheader ]
+  %i_5.019 = phi i16 [ %inc, %for.body ], [ 0, %for.body.preheader ]
+  %arrayidx = getelementptr inbounds [17 x i8], ptr @arr_17, i32 0, i32 %conv20
+  store i8 %val, ptr %arrayidx, align 1, !tbaa !8


nit: remove !tbaa metadata

fhahn · 2024-02-13T17:54:07Z

+; CHECK-NOT:       pred.store.continue:
+;
+entry:
+  %cmp18 = icmp sgt i32 %limit, 0


nit: the check and branch shouldn't be needed.

fhahn · 2024-02-13T17:55:08Z

+;int main(void) {
+;  int arr_2[SIZE][SIZE][SIZE];
+;
+;  __builtin_memset(arr_2, 1, sizeof(arr_2));


Usually we don't include C/C++ source code, as the IR usually needs to stand on its own. Below are a few suggestions to further simplify the IR and make it more readable.

It would be helpful if you could instead a brief comment explaining the issue.

fhahn · 2024-02-13T17:55:52Z

+
+define void @test(i32 %limit, i8 zeroext %val, ptr readonly %arr_14)   {
+; CHECK-LABEL: @test(
+; CHECK-NOT:       pred.store.if:


This is quite fragile; some existing tests use CHECK-NOT: vector.body: to check for not vectorizing.

fhahn · 2024-02-13T17:56:40Z

+
+
+
+!4 = !{!5, !5, i64 0}


nodes used by tbaa shouldn't be needed after dropping !tbaa

fhahn · 2024-02-13T17:57:27Z

@@ -0,0 +1,85 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -passes=loop-vectorize -S | FileCheck %s


As this is added as a target-independent test, it probably needs something like -force-vector-width=4 -force-vector-interleave=1 to make sure the vectorizer tries to vectorize independent of the cost-model.

niwinanto · 2024-02-13T19:58:33Z

Thanks for the patch!

Could you add the test as a separate PR (with a FIXME); this patch then just adjust the test and the diff shows the change in the test only.

Previously there was a patch shared here https://reviews.llvm.org/D115109 by @rickyz (hope it's the same as on Phabricator) but the patch never got pushed through. Would be good to look at the comments and potentially pick it up

Thanks @fhahn for the reviews. Great that you mentioned the Phabricator patch, the test looks good and I copied here. As you suggested, created new pr for the test case with default behavior(niwinanto@33ec308) and then updated this pr. However, I messed with the git workflow(I think). Could you please take a look, this is what you intended.

rickyz · 2024-02-13T21:53:37Z

Thank you @niwinanto for picking this up (and apologies for letting the change languish for so long despite @fhahn's helpful comments!)

fhahn · 2024-02-19T16:21:34Z

Thanks @fhahn for the reviews. Great that you mentioned the Phabricator patch, the test looks good and I copied here. As you suggested, created new pr for the test case with default behavior(niwinanto@33ec308) and then updated this pr. However, I messed with the git workflow(I think). Could you please take a look, this is what you intended.

Yeah that looks good, I'll add a few small additional comments. But best to create a separate PR to just add the test case showing the issue first.

niwinanto · 2024-02-19T18:47:02Z

Thanks @fhahn for the reviews. Great that you mentioned the Phabricator patch, the test looks good and I copied here. As you suggested, created new pr for the test case with default behavior(niwinanto@33ec308) and then updated this pr. However, I messed with the git workflow(I think). Could you please take a look, this is what you intended.

Yeah that looks good, I'll add a few small additional comments. But best to create a separate PR to just add the test case showing the issue first.

@fhahn I am exactly trying to create a separate PR. niwinanto#2. May be you can help me to figure out what I am doing wrong. I am extremely sorry, getting used to the new workflow.

As you suggested, I created a new commit with different branch and created new PR(for test as mentioned above). For some reason it contain the commit from this PR, which I tried to remove by dropping in interactive re-base and forced push.

Also, addressed feedback regarding the tests.

fhahn · 2024-02-20T09:23:36Z

As you suggested, I created a new commit with different branch and created new PR(for test as mentioned above). For some reason it contain the commit from this PR, which I tried to remove by dropping in interactive re-base and forced push.

Also, addressed feedback regarding the tests.

Looking at https://github.com/niwinanto/llvm-project/pull/2/commits, it looks like there's a single commit adding the test, so that looks good I think? Could you update the destination branch to be upstream llvm-project's main?

niwinanto · 2024-02-20T09:38:19Z

As you suggested, I created a new commit with different branch and created new PR(for test as mentioned above). For some reason it contain the commit from this PR, which I tried to remove by dropping in interactive re-base and forced push.
Also, addressed feedback regarding the tests.

Looking at https://github.com/niwinanto/llvm-project/pull/2/commits, it looks like there's a single commit adding the test, so that looks good I think? Could you update the destination branch to be upstream llvm-project's main?

@fhahn Here is the PR, #82329.

niwinanto · 2024-02-29T13:36:48Z

@fhahn Updated the PR to adjust the changes after merging the test early.

fhahn

LGTM, thanks!

I adjusted the description of the PR a bit to add a few more details.

github-actions · 2024-03-04T11:33:50Z

@niwinanto Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested
by our build bots. If there is a problem with a build, you may recieve a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as
the builds can include changes from many authors. It is not uncommon for your
change to be included in a build that fails due to someone else's changes, or
infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself.
This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

However we still have a restriction that IVs can't have outside users. This was added separately to the AllowedExit restriction in llvm#81609, but it looks like llvm#149042 didn't remove it. AFAICT we currently extract the correct lane for IVs, so this PR relaxes the restriction. This helps a good few loops get tail folded in llvm-test-suite. -force-tail-folding-style=none was added to pr5881-scev-expansion.ll to preserve the original scev expansion, since otherwise we end up with a cttz.elts(false, false, true, true) that blocks SCEV analysis. We should probably teach ConstantFolding to fold it.

#149042 added last-active-lane and removed the restriction that we couldn't tail fold loops that had outside users (in AllowedExit). However we still have a restriction that IVs can't have outside users. This was added separately to the AllowedExit restriction in #81609, but it looks like #149042 didn't remove it. AFAICT we currently extract the correct lane for IVs, so this PR relaxes the restriction. This helps a good few loops get tail folded in llvm-test-suite. -force-tail-folding-style=none was added to pr5881-scev-expansion.ll to preserve the original scev expansion, since otherwise we end up with a cttz.elts(false, false, true, true) that blocks SCEV analysis. We should probably teach ConstantFolding to fold it.

[LV] Disable fold tail by masking - when induction vars used outside

5d2f84b

llvmbot added vectorizers llvm:transforms labels Feb 13, 2024

fhahn reviewed Feb 13, 2024

View reviewed changes

Code Review adjustments

5b6abb1

niwinanto and others added 2 commits February 29, 2024 13:46

Merge branch 'main' into niwinanto/tailFoldUsedIV

1f9609a

Adjust test after committing early

7e1d66d

fhahn reviewed Mar 4, 2024

View reviewed changes

fhahn merged commit eaf0d82 into llvm:main Mar 4, 2024

niwinanto deleted the niwinanto/tailFoldUsedIV branch March 23, 2024 10:46

lukel97 mentioned this pull request Feb 19, 2026

[LV] Allow tail folding with IVs with outside users #182322

Merged

		@@ -0,0 +1,85 @@
		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
		; RUN: opt < %s -passes=loop-vectorize -S \| FileCheck %s

Conversation

niwinanto commented Feb 13, 2024 • edited by fhahn Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Feb 13, 2024

Uh oh!

llvmbot commented Feb 13, 2024

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

niwinanto commented Feb 13, 2024

Uh oh!

rickyz commented Feb 13, 2024

Uh oh!

fhahn commented Feb 19, 2024

Uh oh!

niwinanto commented Feb 19, 2024

Uh oh!

fhahn commented Feb 20, 2024

Uh oh!

niwinanto commented Feb 20, 2024

Uh oh!

niwinanto commented Feb 29, 2024

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Mar 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

niwinanto commented Feb 13, 2024 •

edited by fhahn

Loading