Skip to content

AMDGPU: Fix runtime unrolling when cascaded GEPs present#147700

Merged
macurtis-amd merged 3 commits intollvm:mainfrom
macurtis-amd:unroll-when-gep-cascaded
Jul 10, 2025
Merged

AMDGPU: Fix runtime unrolling when cascaded GEPs present#147700
macurtis-amd merged 3 commits intollvm:mainfrom
macurtis-amd:unroll-when-gep-cascaded

Conversation

@macurtis-amd
Copy link
Contributor

Cascaded GEP (i.e. GEP of GEP) are not handled when determining if it is ok to runtime unroll loops.

This change simply uses getUnderlyingObjects to look through cascaded GEPs.

@llvmbot
Copy link
Member

llvmbot commented Jul 9, 2025

@llvm/pr-subscribers-llvm-transforms

Author: None (macurtis-amd)

Changes

Cascaded GEP (i.e. GEP of GEP) are not handled when determining if it is ok to runtime unroll loops.

This change simply uses getUnderlyingObjects to look through cascaded GEPs.


Full diff: https://github.com/llvm/llvm-project/pull/147700.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp (+6-3)
  • (added) llvm/test/Transforms/LoopUnroll/AMDGPU/unroll-runtime.ll (+44)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index f693580929518..24f4df2aff9d1 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -216,10 +216,13 @@ void AMDGPUTTIImpl::getUnrollingPreferences(
         // a variable, most likely we will be unable to combine it.
         // Do not unroll too deep inner loops for local memory to give a chance
         // to unroll an outer loop for a more important reason.
-        if (LocalGEPsSeen > 1 || L->getLoopDepth() > 2 ||
-            (!isa<GlobalVariable>(GEP->getPointerOperand()) &&
-             !isa<Argument>(GEP->getPointerOperand())))
+        if (LocalGEPsSeen > 1 || L->getLoopDepth() > 2)
           continue;
+
+        const Value *V = getUnderlyingObject(GEP->getPointerOperand());
+        if (!isa<GlobalVariable>(V) && !isa<Argument>(V))
+          continue;
+
         LLVM_DEBUG(dbgs() << "Allow unroll runtime for loop:\n"
                           << *L << " due to LDS use.\n");
         UP.Runtime = UnrollRuntimeLocal;
diff --git a/llvm/test/Transforms/LoopUnroll/AMDGPU/unroll-runtime.ll b/llvm/test/Transforms/LoopUnroll/AMDGPU/unroll-runtime.ll
new file mode 100644
index 0000000000000..9414a1e24e542
--- /dev/null
+++ b/llvm/test/Transforms/LoopUnroll/AMDGPU/unroll-runtime.ll
@@ -0,0 +1,44 @@
+; RUN: opt -mtriple=amdgcn-unknown-amdhsa -passes=loop-unroll -S %s | FileCheck %s
+
+%struct.wombat = type { %struct.zot, i32, [16 x i32], [16 x i32], i32, i32, [16 x i32], i32 }
+%struct.zot = type { i32, i32, [1024 x i32] }
+
+@global = external addrspace(3) global %struct.wombat
+
+; Ensure that a cascaded GEP for local address space does not inhibit unrolling
+;
+; CHECK-LABEL: @unroll_when_cascaded_gep
+; CHECK: bb:
+; CHECK:   br {{.*}}, label %bb2.unr-lcssa, label %bb.new
+; CHECK: bb.new:
+; CHECK:   %unroll_iter = 
+; CHECK:   br label %bb1
+; CHECK: bb1:
+; CHECK:   br {{.*}}, label %bb2.unr-lcssa.loopexit, label %bb1
+; CHECK: bb2.unr-lcssa.loopexit:
+; CHECK:   br label %bb2.unr-lcssa
+; CHECK: bb2.unr-lcssa:
+; CHECK:   br {{.*}}, label %bb1.epil.preheader, label %bb2
+; CHECK: bb1.epil.preheader:
+; CHECK:   br label %bb1.epil
+; CHECK: bb1.epil:
+; CHECK:   br {{.*}}, label %bb1.epil, label %bb2.epilog-lcssa
+; CHECK: bb2.epilog-lcssa:
+; CHECK:   br label %bb2
+; CHECK: bb2:
+; CHECK:   ret void
+define amdgpu_kernel void @unroll_when_cascaded_gep(i32 %arg) {
+bb:
+  br label %bb1
+
+bb1:                                              ; preds = %bb1, %bb
+  %phi = phi i32 [ 0, %bb ], [ %add, %bb1 ]
+  %getelementptr = getelementptr [1024 x i32], ptr addrspace(3) getelementptr inbounds nuw (i8, ptr addrspace(3) @global, i32 8), i32 0, i32 0
+  %add = add i32 %phi, 1
+  %icmp = icmp eq i32 %phi, %arg
+  br i1 %icmp, label %bb2, label %bb1
+
+bb2:                                              ; preds = %bb1
+  ret void
+}
+

@llvmbot
Copy link
Member

llvmbot commented Jul 9, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: None (macurtis-amd)

Changes

Cascaded GEP (i.e. GEP of GEP) are not handled when determining if it is ok to runtime unroll loops.

This change simply uses getUnderlyingObjects to look through cascaded GEPs.


Full diff: https://github.com/llvm/llvm-project/pull/147700.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp (+6-3)
  • (added) llvm/test/Transforms/LoopUnroll/AMDGPU/unroll-runtime.ll (+44)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index f693580929518..24f4df2aff9d1 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -216,10 +216,13 @@ void AMDGPUTTIImpl::getUnrollingPreferences(
         // a variable, most likely we will be unable to combine it.
         // Do not unroll too deep inner loops for local memory to give a chance
         // to unroll an outer loop for a more important reason.
-        if (LocalGEPsSeen > 1 || L->getLoopDepth() > 2 ||
-            (!isa<GlobalVariable>(GEP->getPointerOperand()) &&
-             !isa<Argument>(GEP->getPointerOperand())))
+        if (LocalGEPsSeen > 1 || L->getLoopDepth() > 2)
           continue;
+
+        const Value *V = getUnderlyingObject(GEP->getPointerOperand());
+        if (!isa<GlobalVariable>(V) && !isa<Argument>(V))
+          continue;
+
         LLVM_DEBUG(dbgs() << "Allow unroll runtime for loop:\n"
                           << *L << " due to LDS use.\n");
         UP.Runtime = UnrollRuntimeLocal;
diff --git a/llvm/test/Transforms/LoopUnroll/AMDGPU/unroll-runtime.ll b/llvm/test/Transforms/LoopUnroll/AMDGPU/unroll-runtime.ll
new file mode 100644
index 0000000000000..9414a1e24e542
--- /dev/null
+++ b/llvm/test/Transforms/LoopUnroll/AMDGPU/unroll-runtime.ll
@@ -0,0 +1,44 @@
+; RUN: opt -mtriple=amdgcn-unknown-amdhsa -passes=loop-unroll -S %s | FileCheck %s
+
+%struct.wombat = type { %struct.zot, i32, [16 x i32], [16 x i32], i32, i32, [16 x i32], i32 }
+%struct.zot = type { i32, i32, [1024 x i32] }
+
+@global = external addrspace(3) global %struct.wombat
+
+; Ensure that a cascaded GEP for local address space does not inhibit unrolling
+;
+; CHECK-LABEL: @unroll_when_cascaded_gep
+; CHECK: bb:
+; CHECK:   br {{.*}}, label %bb2.unr-lcssa, label %bb.new
+; CHECK: bb.new:
+; CHECK:   %unroll_iter = 
+; CHECK:   br label %bb1
+; CHECK: bb1:
+; CHECK:   br {{.*}}, label %bb2.unr-lcssa.loopexit, label %bb1
+; CHECK: bb2.unr-lcssa.loopexit:
+; CHECK:   br label %bb2.unr-lcssa
+; CHECK: bb2.unr-lcssa:
+; CHECK:   br {{.*}}, label %bb1.epil.preheader, label %bb2
+; CHECK: bb1.epil.preheader:
+; CHECK:   br label %bb1.epil
+; CHECK: bb1.epil:
+; CHECK:   br {{.*}}, label %bb1.epil, label %bb2.epilog-lcssa
+; CHECK: bb2.epilog-lcssa:
+; CHECK:   br label %bb2
+; CHECK: bb2:
+; CHECK:   ret void
+define amdgpu_kernel void @unroll_when_cascaded_gep(i32 %arg) {
+bb:
+  br label %bb1
+
+bb1:                                              ; preds = %bb1, %bb
+  %phi = phi i32 [ 0, %bb ], [ %add, %bb1 ]
+  %getelementptr = getelementptr [1024 x i32], ptr addrspace(3) getelementptr inbounds nuw (i8, ptr addrspace(3) @global, i32 8), i32 0, i32 0
+  %add = add i32 %phi, 1
+  %icmp = icmp eq i32 %phi, %arg
+  br i1 %icmp, label %bb2, label %bb1
+
+bb2:                                              ; preds = %bb1
+  ret void
+}
+

@macurtis-amd macurtis-amd requested review from arsenm and jayfoad July 9, 2025 11:32
Copy link
Collaborator

@JonChesterfield JonChesterfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's consistent. Thanks very much for running it to ground!

@macurtis-amd macurtis-amd merged commit cff4a00 into llvm:main Jul 10, 2025
9 checks passed
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Jul 17, 2025
Cascaded GEP (i.e. GEP of GEP) are not handled when determining if it is
ok to runtime unroll loops.

This change simply uses `getUnderlyingObjects` to look through cascaded
GEPs.
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Jul 17, 2025
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Jul 17, 2025
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Jul 17, 2025
jmmartinez added a commit to ROCm/llvm-project that referenced this pull request Feb 4, 2026
jmmartinez added a commit to ROCm/llvm-project that referenced this pull request Feb 4, 2026
ronlieb pushed a commit to ROCm/llvm-project that referenced this pull request Feb 4, 2026
tvukovic-amd pushed a commit to ROCm/llvm-project that referenced this pull request Feb 26, 2026
JeniferC99 added a commit to ROCm/llvm-project that referenced this pull request Feb 26, 2026
ronlieb added a commit to ronlieb/llvm-project that referenced this pull request Feb 26, 2026
chiranjeevipattigidi added a commit to ROCm/llvm-project that referenced this pull request Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants