Skip to content

[HLSL][DirectX] Emit convergence control tokens when targeting DirectX#188792

Merged
inbelic merged 13 commits into
llvm:mainfrom
inbelic:inbelic/conv-ctrl
Apr 20, 2026
Merged

[HLSL][DirectX] Emit convergence control tokens when targeting DirectX#188792
inbelic merged 13 commits into
llvm:mainfrom
inbelic:inbelic/conv-ctrl

Conversation

@inbelic

@inbelic inbelic commented Mar 26, 2026

Copy link
Copy Markdown
Contributor

This pr allows codegen to generate convergence control tokens. This allows for a more accurate description of convergence behaviour to prevent (or allow) invalid control flow graph transforms. As noted, the use of convergence control tokens is the ideal norm and this follows that by enabling it for DirectX.

This was done now under the precedent of preventing a convergent exit condition of a loop from being illegally moved across control flow. Test cases for this are explicitly added.

Please see the individual commits for logically similar chunks. Unfortunately, it is tricky to stage this in smaller individual commits.

Resolves #180621.

#188537 is a pre-requisite of this passing HLSL offload suite tests.

Assisted by: Github Copilot

inbelic added 9 commits March 26, 2026 15:56
This follows the previous correction made in llvm#140120
When emitting intrinsics marked as convergent we need to annotate them
with the correct convergence tokens
This commit fixes tests that use a CHECK-NEXT by inserting a CHECK-NEXT
of the newly add intrinsic call
These files were checking on hard-coded ssa register names. This make
them inflexible to any code gen updates (like this change).

Update the checks to match in a flexible manner
void maybeAttachRangeForLoad(llvm::LoadInst *Load, QualType Ty,
SourceLocation Loc);

private:

@inbelic inbelic Mar 26, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was previously a public method but was marked private in a clean-up pr as it had no uses at the time and the same effect can be achieved with public members.

@inbelic inbelic marked this pull request as ready for review March 26, 2026 19:56
@inbelic inbelic changed the title [HLSL] Emit convergence control tokens when targeting DirectX [HLSL][DirectX] Emit convergence control tokens when targeting DirectX Mar 26, 2026
@llvmbot llvmbot added clang:codegen IR generation bugs: mangling, exceptions, etc. backend:DirectX HLSL HLSL Language Support llvm:transforms labels Mar 26, 2026
@llvmbot

llvmbot commented Mar 26, 2026

Copy link
Copy Markdown
Member

@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-hlsl

@llvm/pr-subscribers-backend-directx

Author: Finn Plummer (inbelic)

Changes

This pr allows codegen to generate convergence control tokens. This allows for a more accurate description of convergence behaviour to prevent (or allow) invalid control flow graph transforms. As noted, the use of convergence control tokens is the ideal norm and this follows that by enabling it for DirectX.

This was done now under the precedent of preventing a convergent exit condition of a loop from being illegally moved across control flow. Test cases for this are explicitly added.

Please see the individual commits for logically similar chunks. Unfortunately, it is tricky to stage this in smaller individual commits.

Resolves #180621.

#188537 is a pre-requisite of this passing HLSL offload suite tests.

Assisted by: Github Copilot


Patch is 224.99 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/188792.diff

72 Files Affected:

  • (modified) clang/lib/CodeGen/CGExprAgg.cpp (+12)
  • (modified) clang/lib/CodeGen/CGHLSLBuiltins.cpp (+7-3)
  • (modified) clang/lib/CodeGen/CGHLSLRuntime.cpp (+20-3)
  • (modified) clang/lib/CodeGen/CodeGenFunction.h (+1-1)
  • (modified) clang/lib/CodeGen/CodeGenModule.h (+1-1)
  • (modified) clang/test/CodeGenDirectX/Builtins/dot2add.c (+1)
  • (modified) clang/test/CodeGenHLSL/BasicFeatures/ArrayReturn.hlsl (+2)
  • (modified) clang/test/CodeGenHLSL/BasicFeatures/InitLists.hlsl (+29)
  • (modified) clang/test/CodeGenHLSL/BasicFeatures/MatrixConstructor.hlsl (+3)
  • (modified) clang/test/CodeGenHLSL/BasicFeatures/MatrixElementTypeCast.hlsl (+9)
  • (modified) clang/test/CodeGenHLSL/BasicFeatures/MatrixExplicitTruncation.hlsl (+9)
  • (modified) clang/test/CodeGenHLSL/BasicFeatures/MatrixImplicitTruncation.hlsl (+8)
  • (modified) clang/test/CodeGenHLSL/BasicFeatures/MatrixSingleSubscriptConstSwizzle.hlsl (+7)
  • (modified) clang/test/CodeGenHLSL/BasicFeatures/MatrixSingleSubscriptDynamicSwizzle.hlsl (+5)
  • (modified) clang/test/CodeGenHLSL/BasicFeatures/MatrixSingleSubscriptGetter.hlsl (+10)
  • (modified) clang/test/CodeGenHLSL/BasicFeatures/MatrixSingleSubscriptSetter.hlsl (+5)
  • (modified) clang/test/CodeGenHLSL/BasicFeatures/MatrixSplat.hlsl (+12)
  • (modified) clang/test/CodeGenHLSL/BasicFeatures/MatrixToAndFromVectorConstructors.hlsl (+5)
  • (modified) clang/test/CodeGenHLSL/BoolMatrix.hlsl (+8)
  • (modified) clang/test/CodeGenHLSL/GlobalConstructorFunction.hlsl (+6-5)
  • (modified) clang/test/CodeGenHLSL/GlobalConstructorLib.hlsl (+2)
  • (modified) clang/test/CodeGenHLSL/GlobalConstructors.hlsl (+3-2)
  • (modified) clang/test/CodeGenHLSL/GlobalDestructors.hlsl (+15-14)
  • (modified) clang/test/CodeGenHLSL/builtins/AddUint64.hlsl (+2)
  • (modified) clang/test/CodeGenHLSL/builtins/ScalarSwizzles.hlsl (+1-1)
  • (modified) clang/test/CodeGenHLSL/builtins/abs.hlsl (+1-1)
  • (modified) clang/test/CodeGenHLSL/builtins/ceil.hlsl (+1-1)
  • (modified) clang/test/CodeGenHLSL/builtins/f16tof32-builtin.hlsl (+4-4)
  • (modified) clang/test/CodeGenHLSL/builtins/f16tof32.hlsl (+4-4)
  • (modified) clang/test/CodeGenHLSL/builtins/f32tof16-builtin.hlsl (+8-8)
  • (modified) clang/test/CodeGenHLSL/builtins/f32tof16.hlsl (+8-8)
  • (modified) clang/test/CodeGenHLSL/builtins/floor.hlsl (+1-1)
  • (modified) clang/test/CodeGenHLSL/builtins/mad.hlsl (+24-24)
  • (modified) clang/test/CodeGenHLSL/convergence/cf.for.plain.hlsl (+2)
  • (modified) clang/test/CodeGenHLSL/convergence/do.while.hlsl (+16-14)
  • (modified) clang/test/CodeGenHLSL/convergence/entry.point.hlsl (+3-2)
  • (modified) clang/test/CodeGenHLSL/convergence/for.hlsl (+28-26)
  • (modified) clang/test/CodeGenHLSL/convergence/global_array.hlsl (+3-2)
  • (modified) clang/test/CodeGenHLSL/convergence/while.hlsl (+21-19)
  • (modified) clang/test/CodeGenHLSL/inline-constructors.hlsl (+4-2)
  • (modified) clang/test/CodeGenHLSL/matrix-member-one-based-accessor-scalar-load.hlsl (+16)
  • (modified) clang/test/CodeGenHLSL/matrix-member-one-based-accessor-scalar-store.hlsl (+16)
  • (modified) clang/test/CodeGenHLSL/matrix-member-one-based-swizzle-load.hlsl (+8)
  • (modified) clang/test/CodeGenHLSL/matrix-member-one-based-swizzle-store.hlsl (+8)
  • (modified) clang/test/CodeGenHLSL/matrix-member-zero-based-accessor-scalar-load.hlsl (+16)
  • (modified) clang/test/CodeGenHLSL/matrix-member-zero-based-accessor-scalar-store.hlsl (+16)
  • (modified) clang/test/CodeGenHLSL/matrix-member-zero-based-swizzle-load.hlsl (+8)
  • (modified) clang/test/CodeGenHLSL/matrix-member-zero-based-swizzle-store.hlsl (+8)
  • (modified) clang/test/CodeGenHLSL/resources/ByteAddressBuffers-constructors.hlsl (+4)
  • (modified) clang/test/CodeGenHLSL/resources/ByteAddressBuffers-methods.hlsl (+3-3)
  • (modified) clang/test/CodeGenHLSL/resources/CBufferMatrixSingleSubscriptSwizzle.hlsl (+1)
  • (modified) clang/test/CodeGenHLSL/resources/MatrixElement_cbuffer.hlsl (+3)
  • (modified) clang/test/CodeGenHLSL/resources/StructuredBuffers-methods-lib.hlsl (+2-2)
  • (modified) clang/test/CodeGenHLSL/resources/StructuredBuffers-methods-ps.hlsl (+1-1)
  • (modified) clang/test/CodeGenHLSL/resources/TypedBuffers-constructor.hlsl (+4)
  • (modified) clang/test/CodeGenHLSL/resources/TypedBuffers-methods.hlsl (+1-1)
  • (modified) clang/test/CodeGenHLSL/resources/cbuffer.hlsl (+1)
  • (modified) clang/test/CodeGenHLSL/resources/cbuffer_with_packoffset.hlsl (+1)
  • (modified) clang/test/CodeGenHLSL/resources/res-array-global-subarray-many.hlsl (+15-1)
  • (modified) clang/test/CodeGenHLSL/resources/res-array-global-subarray-one.hlsl (+8-1)
  • (modified) clang/test/CodeGenHLSL/resources/res-array-local-multi-dim.hlsl (+2)
  • (modified) clang/test/CodeGenHLSL/resources/res-array-local1.hlsl (+2)
  • (modified) clang/test/CodeGenHLSL/resources/res-array-local2.hlsl (+2)
  • (modified) clang/test/CodeGenHLSL/resources/res-array-local3.hlsl (+2)
  • (modified) clang/test/CodeGenHLSL/static-local-ctor.hlsl (+1)
  • (modified) clang/test/CodeGenHLSL/this-assignment-overload.hlsl (+6-4)
  • (modified) clang/test/CodeGenHLSL/this-assignment.hlsl (+3)
  • (modified) clang/test/CodeGenHLSL/this-reference.hlsl (+4-3)
  • (modified) clang/test/SemaHLSL/Resources/static_resources.hlsl (+7)
  • (added) llvm/test/Transforms/IndVarSimplify/convergent-controlled-loop.ll (+63)
  • (added) llvm/test/Transforms/LoopRotate/convergent-controlled.ll (+63)
  • (added) llvm/test/Transforms/SimpleLoopUnswitch/convergent-controlled.ll (+62)
diff --git a/clang/lib/CodeGen/CGExprAgg.cpp b/clang/lib/CodeGen/CGExprAgg.cpp
index 3a4291719da74..d3dc1014471ec 100644
--- a/clang/lib/CodeGen/CGExprAgg.cpp
+++ b/clang/lib/CodeGen/CGExprAgg.cpp
@@ -715,6 +715,9 @@ void AggExprEmitter::EmitArrayInit(Address DestPtr, llvm::ArrayType *AType,
         Builder.CreatePHI(element->getType(), 2, "arrayinit.cur");
     currentElement->addIncoming(element, entryBB);
 
+    if (CGF.CGM.shouldEmitConvergenceTokens())
+      CGF.ConvergenceTokenStack.push_back(CGF.emitConvergenceLoopToken(bodyBB));
+
     // Emit the actual filler expression.
     {
       // C++1z [class.temporary]p5:
@@ -746,6 +749,9 @@ void AggExprEmitter::EmitArrayInit(Address DestPtr, llvm::ArrayType *AType,
     Builder.CreateCondBr(done, endBB, bodyBB);
     currentElement->addIncoming(nextElement, Builder.GetInsertBlock());
 
+    if (CGF.CGM.shouldEmitConvergenceTokens())
+      CGF.ConvergenceTokenStack.pop_back();
+
     CGF.EmitBlock(endBB);
   }
 }
@@ -1987,6 +1993,9 @@ void AggExprEmitter::VisitArrayInitLoopExpr(const ArrayInitLoopExpr *E,
   llvm::Value *element =
       Builder.CreateInBoundsGEP(llvmElementType, begin, index);
 
+  if (CGF.CGM.shouldEmitConvergenceTokens())
+    CGF.ConvergenceTokenStack.push_back(CGF.emitConvergenceLoopToken(bodyBB));
+
   // Prepare for a cleanup.
   QualType::DestructionKind dtorKind = elementType.isDestructedType();
   EHScopeStack::stable_iterator cleanup;
@@ -2034,6 +2043,9 @@ void AggExprEmitter::VisitArrayInitLoopExpr(const ArrayInitLoopExpr *E,
   llvm::BasicBlock *endBB = CGF.createBasicBlock("arrayinit.end");
   Builder.CreateCondBr(done, endBB, bodyBB);
 
+  if (CGF.CGM.shouldEmitConvergenceTokens())
+    CGF.ConvergenceTokenStack.pop_back();
+
   CGF.EmitBlock(endBB);
 
   // Leave the partial-array cleanup if we entered one.
diff --git a/clang/lib/CodeGen/CGHLSLBuiltins.cpp b/clang/lib/CodeGen/CGHLSLBuiltins.cpp
index 29c41893bdbc4..0ef0fa8630f21 100644
--- a/clang/lib/CodeGen/CGHLSLBuiltins.cpp
+++ b/clang/lib/CodeGen/CGHLSLBuiltins.cpp
@@ -543,9 +543,13 @@ Value *CodeGenFunction::EmitHLSLBuiltinExpr(unsigned BuiltinID,
     Value *IndexOp = EmitScalarExpr(E->getArg(1));
 
     llvm::Type *RetTy = ConvertType(E->getType());
-    return Builder.CreateIntrinsic(
-        RetTy, CGM.getHLSLRuntime().getCreateResourceGetPointerIntrinsic(),
-        ArrayRef<Value *>{HandleOp, IndexOp});
+    llvm::Function *IntrFn = llvm::Intrinsic::getOrInsertDeclaration(
+        &CGM.getModule(),
+        CGM.getHLSLRuntime().getCreateResourceGetPointerIntrinsic(),
+        {RetTy, HandleOp->getType(), IndexOp->getType()});
+    llvm::CallInst *CI = EmitRuntimeCall(IntrFn, {HandleOp, IndexOp});
+    CI->setCallingConv(IntrFn->getCallingConv());
+    return CI;
   }
   case Builtin::BI__builtin_hlsl_resource_sample: {
     Value *HandleOp = EmitScalarExpr(E->getArg(0));
diff --git a/clang/lib/CodeGen/CGHLSLRuntime.cpp b/clang/lib/CodeGen/CGHLSLRuntime.cpp
index 4e6f853890c83..2feb69668d87c 100644
--- a/clang/lib/CodeGen/CGHLSLRuntime.cpp
+++ b/clang/lib/CodeGen/CGHLSLRuntime.cpp
@@ -657,8 +657,16 @@ CGHLSLRuntime::emitDXILUserSemanticLoad(llvm::IRBuilder<> &B, llvm::Type *Type,
                             llvm::PoisonValue::get(B.getInt32Ty())};
 
   llvm::Intrinsic::ID IntrinsicID = llvm::Intrinsic::dx_load_input;
-  llvm::Value *Value = B.CreateIntrinsic(/*ReturnType=*/Type, IntrinsicID, Args,
-                                         nullptr, VariableName);
+
+  SmallVector<OperandBundleDef, 1> OB;
+  if (auto *Token = getConvergenceToken(*B.GetInsertBlock())) {
+    llvm::Value *bundleArgs[] = {Token};
+    OB.emplace_back("convergencectrl", bundleArgs);
+  }
+
+  llvm::Function *IntrFn = llvm::Intrinsic::getOrInsertDeclaration(
+      B.GetInsertBlock()->getModule(), IntrinsicID, {Type});
+  llvm::Value *Value = B.CreateCall(IntrFn, Args, OB, VariableName);
   return Value;
 }
 
@@ -676,7 +684,16 @@ void CGHLSLRuntime::emitDXILUserSemanticStore(llvm::IRBuilder<> &B,
                             Source};
 
   llvm::Intrinsic::ID IntrinsicID = llvm::Intrinsic::dx_store_output;
-  B.CreateIntrinsic(/*ReturnType=*/CGM.VoidTy, IntrinsicID, Args, nullptr);
+
+  SmallVector<OperandBundleDef, 1> OB;
+  if (auto *Token = getConvergenceToken(*B.GetInsertBlock())) {
+    llvm::Value *bundleArgs[] = {Token};
+    OB.emplace_back("convergencectrl", bundleArgs);
+  }
+
+  llvm::Function *IntrFn = llvm::Intrinsic::getOrInsertDeclaration(
+      B.GetInsertBlock()->getModule(), IntrinsicID, {Source->getType()});
+  B.CreateCall(IntrFn, Args, OB);
 }
 
 llvm::Value *CGHLSLRuntime::emitUserSemanticLoad(
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index 0ff93d2ce7363..26c1e6ee00b57 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -5421,11 +5421,11 @@ class CodeGenFunction : public CodeGenTypeCache {
   void maybeAttachRangeForLoad(llvm::LoadInst *Load, QualType Ty,
                                SourceLocation Loc);
 
-private:
   // Emits a convergence_loop instruction for the given |BB|, with |ParentToken|
   // as it's parent convergence instr.
   llvm::ConvergenceControlInst *emitConvergenceLoopToken(llvm::BasicBlock *BB);
 
+private:
   // Adds a convergence_ctrl token with |ParentToken| as parent convergence
   // instr to the call |Input|.
   llvm::CallBase *addConvergenceControlToken(llvm::CallBase *Input);
diff --git a/clang/lib/CodeGen/CodeGenModule.h b/clang/lib/CodeGen/CodeGenModule.h
index 0a697c84b66a7..bfd879f21d8b6 100644
--- a/clang/lib/CodeGen/CodeGenModule.h
+++ b/clang/lib/CodeGen/CodeGenModule.h
@@ -1811,7 +1811,7 @@ class CodeGenModule : public CodeGenTypeCache {
   bool shouldEmitConvergenceTokens() const {
     // TODO: this should probably become unconditional once the controlled
     // convergence becomes the norm.
-    return getTriple().isSPIRVLogical();
+    return getTriple().isSPIRVLogical() || getTriple().isDXIL();
   }
 
   void addUndefinedGlobalForTailCall(
diff --git a/clang/test/CodeGenDirectX/Builtins/dot2add.c b/clang/test/CodeGenDirectX/Builtins/dot2add.c
index 4275a285012b0..bc5073995522e 100644
--- a/clang/test/CodeGenDirectX/Builtins/dot2add.c
+++ b/clang/test/CodeGenDirectX/Builtins/dot2add.c
@@ -8,6 +8,7 @@ typedef half half2 __attribute__((ext_vector_type(2)));
 // CHECK-LABEL: define float @test_dot2add(
 // CHECK-SAME: <2 x half> noundef [[X:%.*]], <2 x half> noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0:[0-9]+]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[X_ADDR:%.*]] = alloca <2 x half>, align 2
 // CHECK-NEXT:    [[Y_ADDR:%.*]] = alloca <2 x half>, align 2
 // CHECK-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
diff --git a/clang/test/CodeGenHLSL/BasicFeatures/ArrayReturn.hlsl b/clang/test/CodeGenHLSL/BasicFeatures/ArrayReturn.hlsl
index 832c4ac9b10f5..b4235eed318e4 100644
--- a/clang/test/CodeGenHLSL/BasicFeatures/ArrayReturn.hlsl
+++ b/clang/test/CodeGenHLSL/BasicFeatures/ArrayReturn.hlsl
@@ -3,12 +3,14 @@
 typedef int Foo[2];
 
 // CHECK-LABEL: define void {{.*}}boop{{.*}}(ptr dead_on_unwind noalias writable sret([2 x i32]) align 4 %agg.result)
+// CHECK:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK: [[G:%.*]] = alloca [2 x i32], align 4
 // CHECK-NEXT: call void @llvm.memcpy.p0.p0.i32(ptr align 4 [[G]], ptr align 4 {{.*}}, i32 8, i1 false)
 // CHECK-NEXT: [[AIB:%.*]] = getelementptr inbounds [2 x i32], ptr %agg.result, i32 0, i32 0
 // CHECK-NEXT: br label %arrayinit.body
 // CHECK: arrayinit.body:
 // CHECK-NEXT: [[AII:%.*]] = phi i32 [ 0, %entry ], [ %arrayinit.next, %arrayinit.body ]
+// CHECK-NEXT: %[[#CV_LOOP:]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %[[#C_ENTRY]]) ]
 // CHECK-NEXT: [[X:%.*]] = getelementptr inbounds i32, ptr [[AIB]], i32 [[AII]]
 // CHECK-NEXT: [[AI:%.*]] = getelementptr inbounds nuw [2 x i32], ptr [[G]], i32 0, i32 [[AII]]
 // CHECK-NEXT: [[Y:%.*]] = load i32, ptr [[AI]], align 4
diff --git a/clang/test/CodeGenHLSL/BasicFeatures/InitLists.hlsl b/clang/test/CodeGenHLSL/BasicFeatures/InitLists.hlsl
index 3d7b8a906cdae..b34fd190a057c 100644
--- a/clang/test/CodeGenHLSL/BasicFeatures/InitLists.hlsl
+++ b/clang/test/CodeGenHLSL/BasicFeatures/InitLists.hlsl
@@ -66,6 +66,7 @@ struct UnnamedDerived : UnnamedOnly {};
 // CHECK-LABEL: define hidden void @_Z5case1v(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]]) #[[ATTR0:[0-9]+]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i32(ptr align 1 [[AGG_RESULT]], ptr align 1 @__const._Z5case1v.TF1, i32 8, i1 false)
 // CHECK-NEXT:    ret void
 //
@@ -78,6 +79,7 @@ TwoFloats case1() {
 // CHECK-LABEL: define hidden void @_Z5case2v(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i32(ptr align 1 [[AGG_RESULT]], ptr align 1 @__const._Z5case2v.TF2, i32 8, i1 false)
 // CHECK-NEXT:    ret void
 //
@@ -90,6 +92,7 @@ TwoFloats case2() {
 // CHECK-LABEL: define hidden void @_Z5case3i(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]], i32 noundef [[VAL:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[VAL_ADDR:%.*]] = alloca i32, align 4
 // CHECK-NEXT:    store i32 [[VAL]], ptr [[VAL_ADDR]], align 4
 // CHECK-NEXT:    [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOFLOATS]], ptr [[AGG_RESULT]], i32 0, i32 0
@@ -110,6 +113,7 @@ TwoFloats case3(int Val) {
 // CHECK-LABEL: define hidden void @_Z5case4Dv2_i(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]], <2 x i32> noundef [[TWOVALS:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[TWOVALS_ADDR:%.*]] = alloca <2 x i32>, align 4
 // CHECK-NEXT:    store <2 x i32> [[TWOVALS]], ptr [[TWOVALS_ADDR]], align 4
 // CHECK-NEXT:    [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOFLOATS]], ptr [[AGG_RESULT]], i32 0, i32 0
@@ -133,6 +137,7 @@ TwoFloats case4(int2 TwoVals) {
 // CHECK-LABEL: define hidden void @_Z5case5Dv2_i(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOINTS:%.*]]) align 1 [[AGG_RESULT:%.*]], <2 x i32> noundef [[TWOVALS:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[TWOVALS_ADDR:%.*]] = alloca <2 x i32>, align 4
 // CHECK-NEXT:    store <2 x i32> [[TWOVALS]], ptr [[TWOVALS_ADDR]], align 4
 // CHECK-NEXT:    [[Z:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOINTS]], ptr [[AGG_RESULT]], i32 0, i32 0
@@ -155,6 +160,7 @@ TwoInts case5(int2 TwoVals) {
 // CHECK-LABEL: define hidden void @_Z5case69TwoFloats(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOINTS:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noundef byval([[STRUCT_TWOFLOATS:%.*]]) align 1 [[TF4:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[Z:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOINTS]], ptr [[AGG_RESULT]], i32 0, i32 0
 // CHECK-NEXT:    [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOFLOATS]], ptr [[TF4]], i32 0, i32 0
 // CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[X]], align 1
@@ -177,6 +183,7 @@ TwoInts case6(TwoFloats TF4) {
 // CHECK-LABEL: define hidden void @_Z5case77TwoIntsS_i9TwoFloatsS0_S0_S0_(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_DOGGO:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noundef byval([[STRUCT_TWOINTS:%.*]]) align 1 [[TI1:%.*]], ptr noundef byval([[STRUCT_TWOINTS]]) align 1 [[TI2:%.*]], i32 noundef [[VAL:%.*]], ptr noundef byval([[STRUCT_TWOFLOATS:%.*]]) align 1 [[TF1:%.*]], ptr noundef byval([[STRUCT_TWOFLOATS]]) align 1 [[TF2:%.*]], ptr noundef byval([[STRUCT_TWOFLOATS]]) align 1 [[TF3:%.*]], ptr noundef byval([[STRUCT_TWOFLOATS]]) align 1 [[TF4:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[VAL_ADDR:%.*]] = alloca i32, align 4
 // CHECK-NEXT:    store i32 [[VAL]], ptr [[VAL_ADDR]], align 4
 // CHECK-NEXT:    [[LEGSTATE:%.*]] = getelementptr inbounds nuw [[STRUCT_DOGGO]], ptr [[AGG_RESULT]], i32 0, i32 0
@@ -241,6 +248,7 @@ Doggo case7(TwoInts TI1, TwoInts TI2, int Val, TwoFloats TF1, TwoFloats TF2,
 // CHECK-LABEL: define hidden void @_Z5case85Doggo(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_ANIMALBITS:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noundef byval([[STRUCT_DOGGO:%.*]]) align 1 [[D1:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[LEGS:%.*]] = getelementptr inbounds nuw [[STRUCT_ANIMALBITS]], ptr [[AGG_RESULT]], i32 0, i32 0
 // CHECK-NEXT:    [[LEGSTATE:%.*]] = getelementptr inbounds nuw [[STRUCT_DOGGO]], ptr [[D1]], i32 0, i32 0
 // CHECK-NEXT:    [[TMP0:%.*]] = load <4 x i32>, ptr [[LEGSTATE]], align 1
@@ -327,6 +335,7 @@ AnimalBits case8(Doggo D1) {
 // CHECK-LABEL: define hidden void @_Z5case95Doggo10AnimalBits(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_ZOO:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noundef byval([[STRUCT_DOGGO:%.*]]) align 1 [[D1:%.*]], ptr noundef byval([[STRUCT_ANIMALBITS:%.*]]) align 1 [[A1:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[DOGS:%.*]] = getelementptr inbounds nuw [[STRUCT_ZOO]], ptr [[AGG_RESULT]], i32 0, i32 0
 // CHECK-NEXT:    [[LEGSTATE:%.*]] = getelementptr inbounds nuw [[STRUCT_DOGGO]], ptr [[DOGS]], i32 0, i32 0
 // CHECK-NEXT:    [[LEGSTATE1:%.*]] = getelementptr inbounds nuw [[STRUCT_DOGGO]], ptr [[D1]], i32 0, i32 0
@@ -743,6 +752,7 @@ Zoo case9(Doggo D1, AnimalBits A1) {
 // CHECK-LABEL: define hidden void @_Z6case109TwoFloatsS_(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_FOURFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noundef byval([[STRUCT_TWOFLOATS:%.*]]) align 1 [[TF1:%.*]], ptr noundef byval([[STRUCT_TWOFLOATS]]) align 1 [[TF2:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOFLOATS]], ptr [[AGG_RESULT]], i32 0, i32 0
 // CHECK-NEXT:    [[X1:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOFLOATS]], ptr [[TF1]], i32 0, i32 0
 // CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[X1]], align 1
@@ -770,6 +780,7 @@ FourFloats case10(TwoFloats TF1, TwoFloats TF2) {
 // CHECK-LABEL: define hidden void @_Z6case11f(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_FOURFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]], float noundef nofpclass(nan inf) [[F:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
 // CHECK-NEXT:    [[REF_TMP:%.*]] = alloca <4 x float>, align 4
 // CHECK-NEXT:    [[REF_TMP1:%.*]] = alloca <4 x float>, align 4
@@ -819,6 +830,7 @@ FourFloats case11(float F) {
 // CHECK-LABEL: define hidden void @_Z6case12ii(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_SLICYBITS:%.*]]) align 1 [[AGG_RESULT:%.*]], i32 noundef [[I:%.*]], i32 noundef [[J:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[I_ADDR:%.*]] = alloca i32, align 4
 // CHECK-NEXT:    [[J_ADDR:%.*]] = alloca i32, align 4
 // CHECK-NEXT:    store i32 [[I]], ptr [[I_ADDR]], align 4
@@ -841,6 +853,7 @@ SlicyBits case12(int I, int J) {
 // CHECK-LABEL: define hidden void @_Z6case137TwoInts(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_SLICYBITS:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noundef byval([[STRUCT_TWOINTS:%.*]]) align 1 [[TI:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[Z:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOINTS]], ptr [[TI]], i32 0, i32 0
 // CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[Z]], align 1
 // CHECK-NEXT:    [[TMP1:%.*]] = trunc i32 [[TMP0]] to i8
@@ -861,6 +874,7 @@ SlicyBits case13(TwoInts TI) {
 // CHECK-LABEL: define hidden void @_Z6case149SlicyBits(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOINTS:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noundef byval([[STRUCT_SLICYBITS:%.*]]) align 1 [[SB:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[Z:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOINTS]], ptr [[AGG_RESULT]], i32 0, i32 0
 // CHECK-NEXT:    [[BF_LOAD:%.*]] = load i8, ptr [[SB]], align 1
 // CHECK-NEXT:    [[BF_CAST:%.*]] = sext i8 [[BF_LOAD]] to i32
@@ -881,6 +895,7 @@ TwoInts case14(SlicyBits SB) {
 // CHECK-LABEL: define hidden void @_Z6case159SlicyBits(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noundef byval([[STRUCT_SLICYBITS:%.*]]) align 1 [[SB:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOFLOATS]], ptr [[AGG_RESULT]], i32 0, i32 0
 // CHECK-NEXT:    [[BF_LOAD:%.*]] = load i8, ptr [[SB]], align 1
 // CHECK-NEXT:    [[BF_CAST:%.*]] = sext i8 [[BF_LOAD]] to i32
@@ -904,6 +919,7 @@ TwoFloats case15(SlicyBits SB) {
 // CHECK-LABEL: define hidden void @_Z7makeTwoRf(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noalias noundef nonnull align 4 dereferenceable(4) [[X:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[X_ADDR:%.*]] = alloca ptr, align 4
 // CHECK-NEXT:    store ptr [[X]], ptr [[X_ADDR]], align 4
 // CHECK-NEXT:    [[X1:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOFLOATS]], ptr [[AGG_RESULT]], i32 0, i32 0
@@ -930,6 +946,7 @@ TwoFloats makeTwo(inout float X) {
 // CHECK-LABEL: define hidden void @_Z6case16v(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_FOURFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[X:%.*]] = alloca float, align 4
 // CHECK-NEXT:    [[REF_TMP:%.*]] = alloca [[STRUCT_TWOFLOATS:%.*]], align 1
 // CHECK-NEXT:    [[TMP:%.*]] = alloca float, align 4
@@ -963,6 +980,7 @@ FourFloats case16() {
 // CHECK-LABEL: define hidden noundef i32 @_Z12case17Helperi(
 // CHECK-SAME: i32 noundef [[X:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[X_ADDR:%.*]] = alloca i32, align 4
 // CHECK-NEXT:    store i32 [[X]], ptr [[X_ADDR]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[X_ADDR]], align 4
@@ -976,6 +994,7 @@ int case17Helper(int x) {
 // CHECK-LABEL: define hidden void @_Z6case17v(
 // CHECK-SAME: ) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[X:%.*]] = alloca <2 x i32>, align 4
 // CHECK-NEXT:    [[CALL:%.*]] = call noundef i32 @_Z12case17Helperi(i32 noundef 0) #[[ATTR2]]
 // CHECK-NEXT:    [[CALL1:%.*]] = call noundef i32 @_Z12case17Helperi(i32 nou...
[truncated]

@s-perron s-perron requested a review from Keenuts March 27, 2026 00:46

@Icohedron Icohedron left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bob80905 bob80905 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to my untrained eye, would recommend updating the new test descriptions and maybe even being a bit verbose with them.

inbelic added a commit that referenced this pull request Apr 16, 2026
The `SPIRVStripConvergenceIntrinsic` pass was written as a spirv pass as
it is the currently the only target that emits convergence tokens during
codegen. There is nothing target specific to the pass, and, we plan to
emit convergence tokens when targeting DirectX (and all targets in
general), so move the pass to a common place.

The previous pass used temporary `Undef`s, as part of moving the pass we
can simply reverse the traverse order to remove the use of `Undef` as it
is deprecated.

Enables the pass for targeting DirectX and is a pre-req for:
#188792.

Assisted by: Github Copilot
@inbelic

inbelic commented Apr 16, 2026

Copy link
Copy Markdown
Contributor Author

@Keenuts I will merge this tmrw after I confirm the hlsl test suite is passing as expected. Let me know if I should hold until you get a chance to review. Thanks

llvm-sync Bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 16, 2026
…ls (#188537)

The `SPIRVStripConvergenceIntrinsic` pass was written as a spirv pass as
it is the currently the only target that emits convergence tokens during
codegen. There is nothing target specific to the pass, and, we plan to
emit convergence tokens when targeting DirectX (and all targets in
general), so move the pass to a common place.

The previous pass used temporary `Undef`s, as part of moving the pass we
can simply reverse the traverse order to remove the use of `Undef` as it
is deprecated.

Enables the pass for targeting DirectX and is a pre-req for:
llvm/llvm-project#188792.

Assisted by: Github Copilot
@github-actions

github-actions Bot commented Apr 16, 2026

Copy link
Copy Markdown

🪟 Windows x64 Test Results

  • 135355 tests passed
  • 4455 tests skipped

✅ The build succeeded and all tests passed.

cpullvm-upstream-sync Bot pushed a commit to navaneethshan/cpullvm-toolchain-1 that referenced this pull request Apr 16, 2026
…ls (#188537)

The `SPIRVStripConvergenceIntrinsic` pass was written as a spirv pass as
it is the currently the only target that emits convergence tokens during
codegen. There is nothing target specific to the pass, and, we plan to
emit convergence tokens when targeting DirectX (and all targets in
general), so move the pass to a common place.

The previous pass used temporary `Undef`s, as part of moving the pass we
can simply reverse the traverse order to remove the use of `Undef` as it
is deprecated.

Enables the pass for targeting DirectX and is a pre-req for:
llvm/llvm-project#188792.

Assisted by: Github Copilot
inbelic added 2 commits April 16, 2026 18:11
These changes were introduced between pr open and now. Fixing them in
the same manner as before

@hekota hekota left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few nits in the test changes.

Texture2D<float4> t;

// CHECK: define internal {{.*}} <4 x float> @test_mips(float vector[2])(<2 x float> {{.*}} %loc) #1 {
// CHECK: define internal {{.*}} <4 x float> @test_mips(float vector[2])(<2 x float> {{.*}} %loc) #2 {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// CHECK: define internal {{.*}} <4 x float> @test_mips(float vector[2])(<2 x float> {{.*}} %loc) #2 {
// CHECK: define internal {{.*}} <4 x float> @test_mips(float vector[2])(<2 x float> {{.*}} %loc)

// CHECK: %hlsl.f32tof16 = call i32 @llvm.dx.legacyf32tof16.f32(float %[[#]])
// CHECK: ret i32 %hlsl.f32tof16
// CHECK: declare i32 @llvm.dx.legacyf32tof16.f32(float) #1
// CHECK: declare i32 @llvm.dx.legacyf32tof16.f32(float) #2

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// CHECK: declare i32 @llvm.dx.legacyf32tof16.f32(float) #2
// CHECK: declare i32 @llvm.dx.legacyf32tof16.f32(float)

The function attributes are not important and can be removed. This applies to all of the other files you needed to update that have the same pattern.

Comment on lines +25 to +28
// CHECK: %call1 = call noundef i32 @_ZN4Pair8getFirstEv(ptr noundef nonnull align 1 dereferenceable(8) %Vals) #{{[0-9]+}} [ "convergencectrl"(token %[[#C_ENTRY]]) ]
// CHECK-NEXT: %First = getelementptr inbounds nuw %struct.Pair, ptr %Vals, i32 0, i32 0
// CHECK-NEXT: store i32 %call, ptr %First, align 1
// CHECK-NEXT: %call1 = call reassoc nnan ninf nsz arcp afn noundef nofpclass(nan inf) float @_ZN4Pair9getSecondEv(ptr noundef nonnull align 1 dereferenceable(8) %Vals)
// CHECK-NEXT: store i32 %call1, ptr %First, align 1
// CHECK-NEXT: %call2 = call reassoc nnan ninf nsz arcp afn noundef nofpclass(nan inf) float @_ZN4Pair9getSecondEv(ptr noundef nonnull align 1 dereferenceable(8) %Vals) #{{[0-9]+}} [ "convergencectrl"(token %[[#C_ENTRY]]) ]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use a regex for call1 and call2.

alexfh pushed a commit to alexfh/llvm-project that referenced this pull request Apr 18, 2026
…8537)

The `SPIRVStripConvergenceIntrinsic` pass was written as a spirv pass as
it is the currently the only target that emits convergence tokens during
codegen. There is nothing target specific to the pass, and, we plan to
emit convergence tokens when targeting DirectX (and all targets in
general), so move the pass to a common place.

The previous pass used temporary `Undef`s, as part of moving the pass we
can simply reverse the traverse order to remove the use of `Undef` as it
is deprecated.

Enables the pass for targeting DirectX and is a pre-req for:
llvm#188792.

Assisted by: Github Copilot
@inbelic inbelic merged commit 2c8c2bd into llvm:main Apr 20, 2026
10 of 11 checks passed
inbelic added a commit that referenced this pull request Apr 20, 2026
…g DirectX" (#193090)

This change appears to introduce complications when trying to do a full
loop unroll that is exhibited here:
https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618.
This results in invalid DXIL as the unreachable branch is not correctly
cleaned up.

Initial leads look like this is because the instructions with
convergence control tokens are still being used for analysis when they
are within an unreachable branch.

Reverts #188792
llvm-sync Bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 20, 2026
…en targeting DirectX" (#193090)

This change appears to introduce complications when trying to do a full
loop unroll that is exhibited here:
https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618.
This results in invalid DXIL as the unreachable branch is not correctly
cleaned up.

Initial leads look like this is because the instructions with
convergence control tokens are still being used for analysis when they
are within an unreachable branch.

Reverts llvm/llvm-project#188792
cpullvm-upstream-sync Bot pushed a commit to navaneethshan/cpullvm-toolchain-1 that referenced this pull request Apr 20, 2026
…en targeting DirectX" (#193090)

This change appears to introduce complications when trying to do a full
loop unroll that is exhibited here:
https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618.
This results in invalid DXIL as the unreachable branch is not correctly
cleaned up.

Initial leads look like this is because the instructions with
convergence control tokens are still being used for analysis when they
are within an unreachable branch.

Reverts llvm/llvm-project#188792
s-perron pushed a commit to s-perron/llvm-project that referenced this pull request Apr 21, 2026
llvm#188792)

This pr allows codegen to generate convergence control tokens. This
allows for a more accurate description of convergence behaviour to
prevent (or allow) invalid control flow graph transforms. As noted, the
use of convergence control tokens is the ideal norm and this follows
that by enabling it for `DirectX`.

This was done now under the precedent of preventing a convergent exit
condition of a loop from being illegally moved across control flow. Test
cases for this are explicitly added.

Please see the individual commits for logically similar chunks.
Unfortunately, it is tricky to stage this in smaller individual commits.

Resolves llvm#180621.

llvm#188537 is a pre-requisite of
this passing HLSL offload suite tests.

Assisted by: Github Copilot
hekota added a commit that referenced this pull request Apr 23, 2026
…rge attempt) (#193584)

Any expression that accesses a resource or resource array member of a global struct instance must be during codegen replaced by an access of the corresponding implicit global resource variable.

When codegen encounters a `MemberExpr` of a resource type, it traverses the AST to locate the parent struct declaration, building the expected global resource variable name along the way. If the parent declaration is a non-static global struct instance, codegen searches its `HLSLAssociatedResourceDeclAttr` attributes to locate the matching global resource variable and then generates IR code to access the resource global in place of the member access.

Fixes #182989

This is the second try to land this. The [first one](#187127 with #188792 and both PRs had to be reverted. No updates needed to this change. I synced with @inbelic and we agreed
that this one should go in first.
llvm-upstreamsync Bot pushed a commit to qualcomm/cpullvm-toolchain that referenced this pull request Apr 24, 2026
…ls (#188537)

The `SPIRVStripConvergenceIntrinsic` pass was written as a spirv pass as
it is the currently the only target that emits convergence tokens during
codegen. There is nothing target specific to the pass, and, we plan to
emit convergence tokens when targeting DirectX (and all targets in
general), so move the pass to a common place.

The previous pass used temporary `Undef`s, as part of moving the pass we
can simply reverse the traverse order to remove the use of `Undef` as it
is deprecated.

Enables the pass for targeting DirectX and is a pre-req for:
llvm/llvm-project#188792.

Assisted by: Github Copilot
llvm-upstreamsync Bot pushed a commit to qualcomm/cpullvm-toolchain that referenced this pull request Apr 24, 2026
…en targeting DirectX" (#193090)

This change appears to introduce complications when trying to do a full
loop unroll that is exhibited here:
https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618.
This results in invalid DXIL as the unreachable branch is not correctly
cleaned up.

Initial leads look like this is because the instructions with
convergence control tokens are still being used for analysis when they
are within an unreachable branch.

Reverts llvm/llvm-project#188792
inbelic added a commit that referenced this pull request Apr 24, 2026
…ly` and `IntrReadMem` (#193593)

`IntrConvergent` was originally added to `dx.resource.getpointer` to
prevent optimization passes (`SimplifyCFG`, `GVN`) from sinking the
intrinsic out of control flow branches, which would create phi nodes on
the returned pointer.

Using `IntrInaccessibleMemOnly` and `IntrReadMem` semantics still
prevent passes from merging or sinking identical calls across branches.
However, this allows the call to be moved within a single control flow
path.

Updates relevant tests and adds a new test to demonstrate a now legal
potential optimization.

This was discovered when
#188792 caused the following
failure:
https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618.
When emitting convergence control tokens, each resource access is then a
user of the convergence control tokens, which makes it's use more
unnecessarily restrictive for optimizations and in this case would
prevent a loop unroll from taking place.

Assisted by: Claude Opus 4.6
llvm-upstreamsync Bot pushed a commit to qualcomm/cpullvm-toolchain that referenced this pull request Apr 24, 2026
…ssibleMemOnly` and `IntrReadMem` (#193593)

`IntrConvergent` was originally added to `dx.resource.getpointer` to
prevent optimization passes (`SimplifyCFG`, `GVN`) from sinking the
intrinsic out of control flow branches, which would create phi nodes on
the returned pointer.

Using `IntrInaccessibleMemOnly` and `IntrReadMem` semantics still
prevent passes from merging or sinking identical calls across branches.
However, this allows the call to be moved within a single control flow
path.

Updates relevant tests and adds a new test to demonstrate a now legal
potential optimization.

This was discovered when
llvm/llvm-project#188792 caused the following
failure:
https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618.
When emitting convergence control tokens, each resource access is then a
user of the convergence control tokens, which makes it's use more
unnecessarily restrictive for optimizations and in this case would
prevent a loop unroll from taking place.

Assisted by: Claude Opus 4.6
llvm-sync Bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 24, 2026
…ssibleMemOnly` and `IntrReadMem` (#193593)

`IntrConvergent` was originally added to `dx.resource.getpointer` to
prevent optimization passes (`SimplifyCFG`, `GVN`) from sinking the
intrinsic out of control flow branches, which would create phi nodes on
the returned pointer.

Using `IntrInaccessibleMemOnly` and `IntrReadMem` semantics still
prevent passes from merging or sinking identical calls across branches.
However, this allows the call to be moved within a single control flow
path.

Updates relevant tests and adds a new test to demonstrate a now legal
potential optimization.

This was discovered when
llvm/llvm-project#188792 caused the following
failure:
https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618.
When emitting convergence control tokens, each resource access is then a
user of the convergence control tokens, which makes it's use more
unnecessarily restrictive for optimizations and in this case would
prevent a loop unroll from taking place.

Assisted by: Claude Opus 4.6
cpullvm-upstream-sync Bot pushed a commit to navaneethshan/cpullvm-toolchain-1 that referenced this pull request Apr 24, 2026
…ssibleMemOnly` and `IntrReadMem` (#193593)

`IntrConvergent` was originally added to `dx.resource.getpointer` to
prevent optimization passes (`SimplifyCFG`, `GVN`) from sinking the
intrinsic out of control flow branches, which would create phi nodes on
the returned pointer.

Using `IntrInaccessibleMemOnly` and `IntrReadMem` semantics still
prevent passes from merging or sinking identical calls across branches.
However, this allows the call to be moved within a single control flow
path.

Updates relevant tests and adds a new test to demonstrate a now legal
potential optimization.

This was discovered when
llvm/llvm-project#188792 caused the following
failure:
https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618.
When emitting convergence control tokens, each resource access is then a
user of the convergence control tokens, which makes it's use more
unnecessarily restrictive for optimizations and in this case would
prevent a loop unroll from taking place.

Assisted by: Claude Opus 4.6
yingopq pushed a commit to yingopq/llvm-project that referenced this pull request Apr 29, 2026
…rge attempt) (llvm#193584)

Any expression that accesses a resource or resource array member of a global struct instance must be during codegen replaced by an access of the corresponding implicit global resource variable.

When codegen encounters a `MemberExpr` of a resource type, it traverses the AST to locate the parent struct declaration, building the expected global resource variable name along the way. If the parent declaration is a non-static global struct instance, codegen searches its `HLSLAssociatedResourceDeclAttr` attributes to locate the matching global resource variable and then generates IR code to access the resource global in place of the member access.

Fixes llvm#182989

This is the second try to land this. The [first one](llvm#187127 with llvm#188792 and both PRs had to be reverted. No updates needed to this change. I synced with @inbelic and we agreed
that this one should go in first.
yingopq pushed a commit to yingopq/llvm-project that referenced this pull request Apr 29, 2026
…ly` and `IntrReadMem` (llvm#193593)

`IntrConvergent` was originally added to `dx.resource.getpointer` to
prevent optimization passes (`SimplifyCFG`, `GVN`) from sinking the
intrinsic out of control flow branches, which would create phi nodes on
the returned pointer.

Using `IntrInaccessibleMemOnly` and `IntrReadMem` semantics still
prevent passes from merging or sinking identical calls across branches.
However, this allows the call to be moved within a single control flow
path.

Updates relevant tests and adds a new test to demonstrate a now legal
potential optimization.

This was discovered when
llvm#188792 caused the following
failure:
https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618.
When emitting convergence control tokens, each resource access is then a
user of the convergence control tokens, which makes it's use more
unnecessarily restrictive for optimizations and in this case would
prevent a loop unroll from taking place.

Assisted by: Claude Opus 4.6
KHicketts pushed a commit to KHicketts/llvm-project that referenced this pull request Apr 30, 2026
llvm#188792)

This pr allows codegen to generate convergence control tokens. This
allows for a more accurate description of convergence behaviour to
prevent (or allow) invalid control flow graph transforms. As noted, the
use of convergence control tokens is the ideal norm and this follows
that by enabling it for `DirectX`.

This was done now under the precedent of preventing a convergent exit
condition of a loop from being illegally moved across control flow. Test
cases for this are explicitly added.

Please see the individual commits for logically similar chunks.
Unfortunately, it is tricky to stage this in smaller individual commits.

Resolves llvm#180621.

llvm#188537 is a pre-requisite of
this passing HLSL offload suite tests.

Assisted by: Github Copilot
KHicketts pushed a commit to KHicketts/llvm-project that referenced this pull request Apr 30, 2026
…g DirectX" (llvm#193090)

This change appears to introduce complications when trying to do a full
loop unroll that is exhibited here:
https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618.
This results in invalid DXIL as the unreachable branch is not correctly
cleaned up.

Initial leads look like this is because the instructions with
convergence control tokens are still being used for analysis when they
are within an unreachable branch.

Reverts llvm#188792
KHicketts pushed a commit to KHicketts/llvm-project that referenced this pull request Apr 30, 2026
…rge attempt) (llvm#193584)

Any expression that accesses a resource or resource array member of a global struct instance must be during codegen replaced by an access of the corresponding implicit global resource variable.

When codegen encounters a `MemberExpr` of a resource type, it traverses the AST to locate the parent struct declaration, building the expected global resource variable name along the way. If the parent declaration is a non-static global struct instance, codegen searches its `HLSLAssociatedResourceDeclAttr` attributes to locate the matching global resource variable and then generates IR code to access the resource global in place of the member access.

Fixes llvm#182989

This is the second try to land this. The [first one](llvm#187127 with llvm#188792 and both PRs had to be reverted. No updates needed to this change. I synced with @inbelic and we agreed
that this one should go in first.
KHicketts pushed a commit to KHicketts/llvm-project that referenced this pull request Apr 30, 2026
…ly` and `IntrReadMem` (llvm#193593)

`IntrConvergent` was originally added to `dx.resource.getpointer` to
prevent optimization passes (`SimplifyCFG`, `GVN`) from sinking the
intrinsic out of control flow branches, which would create phi nodes on
the returned pointer.

Using `IntrInaccessibleMemOnly` and `IntrReadMem` semantics still
prevent passes from merging or sinking identical calls across branches.
However, this allows the call to be moved within a single control flow
path.

Updates relevant tests and adds a new test to demonstrate a now legal
potential optimization.

This was discovered when
llvm#188792 caused the following
failure:
https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618.
When emitting convergence control tokens, each resource access is then a
user of the convergence control tokens, which makes it's use more
unnecessarily restrictive for optimizations and in this case would
prevent a loop unroll from taking place.

Assisted by: Claude Opus 4.6
inbelic added a commit that referenced this pull request May 26, 2026
…g DirectX" (#194452)

The initial landing surfaced 3 somewhat orthogonal issues related to
loop unrolling. These are addressed:
[here](#193592),
[here](#193593) and
[here](#193590).

These caused these
[tests](https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618#step:8:87913)
to fail in the offload test suite.

We can verify that these are now passing as expected (fixing any of the
3 issues would resolve this and allow us to reland)

Some additional tests were added since the revert that are now accounted
for and updated in the reland fixes commit.

This relands #188792
llvm-upstreamsync Bot pushed a commit to qualcomm/cpullvm-toolchain that referenced this pull request May 26, 2026
…en targeting DirectX" (#194452)

The initial landing surfaced 3 somewhat orthogonal issues related to
loop unrolling. These are addressed:
[here](llvm/llvm-project#193592),
[here](llvm/llvm-project#193593) and
[here](llvm/llvm-project#193590).

These caused these
[tests](https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618#step:8:87913)
to fail in the offload test suite.

We can verify that these are now passing as expected (fixing any of the
3 issues would resolve this and allow us to reland)

Some additional tests were added since the revert that are now accounted
for and updated in the reland fixes commit.

This relands llvm/llvm-project#188792
llvm-sync Bot pushed a commit to arm/arm-toolchain that referenced this pull request May 26, 2026
…en targeting DirectX" (#194452)

The initial landing surfaced 3 somewhat orthogonal issues related to
loop unrolling. These are addressed:
[here](llvm/llvm-project#193592),
[here](llvm/llvm-project#193593) and
[here](llvm/llvm-project#193590).

These caused these
[tests](https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618#step:8:87913)
to fail in the offload test suite.

We can verify that these are now passing as expected (fixing any of the
3 issues would resolve this and allow us to reland)

Some additional tests were added since the revert that are now accounted
for and updated in the reland fixes commit.

This relands llvm/llvm-project#188792
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:DirectX clang:codegen IR generation bugs: mangling, exceptions, etc. HLSL HLSL Language Support llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[HLSL][DirectX] Loop is removed when exit condition is convergent

6 participants