[HLSL][DirectX] Emit convergence control tokens when targeting DirectX by inbelic · Pull Request #188792 · llvm/llvm-project

inbelic · 2026-03-26T16:33:45Z

This pr allows codegen to generate convergence control tokens. This allows for a more accurate description of convergence behaviour to prevent (or allow) invalid control flow graph transforms. As noted, the use of convergence control tokens is the ideal norm and this follows that by enabling it for DirectX.

This was done now under the precedent of preventing a convergent exit condition of a loop from being illegally moved across control flow. Test cases for this are explicitly added.

Please see the individual commits for logically similar chunks. Unfortunately, it is tricky to stage this in smaller individual commits.

Resolves #180621.

#188537 is a pre-requisite of this passing HLSL offload suite tests.

Assisted by: Github Copilot

This follows the previous correction made in llvm#140120

When emitting intrinsics marked as convergent we need to annotate them with the correct convergence tokens

… init

This commit fixes tests that use a CHECK-NEXT by inserting a CHECK-NEXT of the newly add intrinsic call

These files were checking on hard-coded ssa register names. This make them inflexible to any code gen updates (like this change). Update the checks to match in a flexible manner

inbelic · 2026-03-26T16:36:24Z

  void maybeAttachRangeForLoad(llvm::LoadInst *Load, QualType Ty,
                               SourceLocation Loc);

-private:


This was previously a public method but was marked private in a clean-up pr as it had no uses at the time and the same effect can be achieved with public members.

llvmbot · 2026-03-26T20:12:00Z

@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-hlsl

@llvm/pr-subscribers-backend-directx

Author: Finn Plummer (inbelic)

Changes

This pr allows codegen to generate convergence control tokens. This allows for a more accurate description of convergence behaviour to prevent (or allow) invalid control flow graph transforms. As noted, the use of convergence control tokens is the ideal norm and this follows that by enabling it for DirectX.

This was done now under the precedent of preventing a convergent exit condition of a loop from being illegally moved across control flow. Test cases for this are explicitly added.

Please see the individual commits for logically similar chunks. Unfortunately, it is tricky to stage this in smaller individual commits.

Resolves #180621.

#188537 is a pre-requisite of this passing HLSL offload suite tests.

Assisted by: Github Copilot

Patch is 224.99 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/188792.diff

72 Files Affected:

(modified) clang/lib/CodeGen/CGExprAgg.cpp (+12)
(modified) clang/lib/CodeGen/CGHLSLBuiltins.cpp (+7-3)
(modified) clang/lib/CodeGen/CGHLSLRuntime.cpp (+20-3)
(modified) clang/lib/CodeGen/CodeGenFunction.h (+1-1)
(modified) clang/lib/CodeGen/CodeGenModule.h (+1-1)
(modified) clang/test/CodeGenDirectX/Builtins/dot2add.c (+1)
(modified) clang/test/CodeGenHLSL/BasicFeatures/ArrayReturn.hlsl (+2)
(modified) clang/test/CodeGenHLSL/BasicFeatures/InitLists.hlsl (+29)
(modified) clang/test/CodeGenHLSL/BasicFeatures/MatrixConstructor.hlsl (+3)
(modified) clang/test/CodeGenHLSL/BasicFeatures/MatrixElementTypeCast.hlsl (+9)
(modified) clang/test/CodeGenHLSL/BasicFeatures/MatrixExplicitTruncation.hlsl (+9)
(modified) clang/test/CodeGenHLSL/BasicFeatures/MatrixImplicitTruncation.hlsl (+8)
(modified) clang/test/CodeGenHLSL/BasicFeatures/MatrixSingleSubscriptConstSwizzle.hlsl (+7)
(modified) clang/test/CodeGenHLSL/BasicFeatures/MatrixSingleSubscriptDynamicSwizzle.hlsl (+5)
(modified) clang/test/CodeGenHLSL/BasicFeatures/MatrixSingleSubscriptGetter.hlsl (+10)
(modified) clang/test/CodeGenHLSL/BasicFeatures/MatrixSingleSubscriptSetter.hlsl (+5)
(modified) clang/test/CodeGenHLSL/BasicFeatures/MatrixSplat.hlsl (+12)
(modified) clang/test/CodeGenHLSL/BasicFeatures/MatrixToAndFromVectorConstructors.hlsl (+5)
(modified) clang/test/CodeGenHLSL/BoolMatrix.hlsl (+8)
(modified) clang/test/CodeGenHLSL/GlobalConstructorFunction.hlsl (+6-5)
(modified) clang/test/CodeGenHLSL/GlobalConstructorLib.hlsl (+2)
(modified) clang/test/CodeGenHLSL/GlobalConstructors.hlsl (+3-2)
(modified) clang/test/CodeGenHLSL/GlobalDestructors.hlsl (+15-14)
(modified) clang/test/CodeGenHLSL/builtins/AddUint64.hlsl (+2)
(modified) clang/test/CodeGenHLSL/builtins/ScalarSwizzles.hlsl (+1-1)
(modified) clang/test/CodeGenHLSL/builtins/abs.hlsl (+1-1)
(modified) clang/test/CodeGenHLSL/builtins/ceil.hlsl (+1-1)
(modified) clang/test/CodeGenHLSL/builtins/f16tof32-builtin.hlsl (+4-4)
(modified) clang/test/CodeGenHLSL/builtins/f16tof32.hlsl (+4-4)
(modified) clang/test/CodeGenHLSL/builtins/f32tof16-builtin.hlsl (+8-8)
(modified) clang/test/CodeGenHLSL/builtins/f32tof16.hlsl (+8-8)
(modified) clang/test/CodeGenHLSL/builtins/floor.hlsl (+1-1)
(modified) clang/test/CodeGenHLSL/builtins/mad.hlsl (+24-24)
(modified) clang/test/CodeGenHLSL/convergence/cf.for.plain.hlsl (+2)
(modified) clang/test/CodeGenHLSL/convergence/do.while.hlsl (+16-14)
(modified) clang/test/CodeGenHLSL/convergence/entry.point.hlsl (+3-2)
(modified) clang/test/CodeGenHLSL/convergence/for.hlsl (+28-26)
(modified) clang/test/CodeGenHLSL/convergence/global_array.hlsl (+3-2)
(modified) clang/test/CodeGenHLSL/convergence/while.hlsl (+21-19)
(modified) clang/test/CodeGenHLSL/inline-constructors.hlsl (+4-2)
(modified) clang/test/CodeGenHLSL/matrix-member-one-based-accessor-scalar-load.hlsl (+16)
(modified) clang/test/CodeGenHLSL/matrix-member-one-based-accessor-scalar-store.hlsl (+16)
(modified) clang/test/CodeGenHLSL/matrix-member-one-based-swizzle-load.hlsl (+8)
(modified) clang/test/CodeGenHLSL/matrix-member-one-based-swizzle-store.hlsl (+8)
(modified) clang/test/CodeGenHLSL/matrix-member-zero-based-accessor-scalar-load.hlsl (+16)
(modified) clang/test/CodeGenHLSL/matrix-member-zero-based-accessor-scalar-store.hlsl (+16)
(modified) clang/test/CodeGenHLSL/matrix-member-zero-based-swizzle-load.hlsl (+8)
(modified) clang/test/CodeGenHLSL/matrix-member-zero-based-swizzle-store.hlsl (+8)
(modified) clang/test/CodeGenHLSL/resources/ByteAddressBuffers-constructors.hlsl (+4)
(modified) clang/test/CodeGenHLSL/resources/ByteAddressBuffers-methods.hlsl (+3-3)
(modified) clang/test/CodeGenHLSL/resources/CBufferMatrixSingleSubscriptSwizzle.hlsl (+1)
(modified) clang/test/CodeGenHLSL/resources/MatrixElement_cbuffer.hlsl (+3)
(modified) clang/test/CodeGenHLSL/resources/StructuredBuffers-methods-lib.hlsl (+2-2)
(modified) clang/test/CodeGenHLSL/resources/StructuredBuffers-methods-ps.hlsl (+1-1)
(modified) clang/test/CodeGenHLSL/resources/TypedBuffers-constructor.hlsl (+4)
(modified) clang/test/CodeGenHLSL/resources/TypedBuffers-methods.hlsl (+1-1)
(modified) clang/test/CodeGenHLSL/resources/cbuffer.hlsl (+1)
(modified) clang/test/CodeGenHLSL/resources/cbuffer_with_packoffset.hlsl (+1)
(modified) clang/test/CodeGenHLSL/resources/res-array-global-subarray-many.hlsl (+15-1)
(modified) clang/test/CodeGenHLSL/resources/res-array-global-subarray-one.hlsl (+8-1)
(modified) clang/test/CodeGenHLSL/resources/res-array-local-multi-dim.hlsl (+2)
(modified) clang/test/CodeGenHLSL/resources/res-array-local1.hlsl (+2)
(modified) clang/test/CodeGenHLSL/resources/res-array-local2.hlsl (+2)
(modified) clang/test/CodeGenHLSL/resources/res-array-local3.hlsl (+2)
(modified) clang/test/CodeGenHLSL/static-local-ctor.hlsl (+1)
(modified) clang/test/CodeGenHLSL/this-assignment-overload.hlsl (+6-4)
(modified) clang/test/CodeGenHLSL/this-assignment.hlsl (+3)
(modified) clang/test/CodeGenHLSL/this-reference.hlsl (+4-3)
(modified) clang/test/SemaHLSL/Resources/static_resources.hlsl (+7)
(added) llvm/test/Transforms/IndVarSimplify/convergent-controlled-loop.ll (+63)
(added) llvm/test/Transforms/LoopRotate/convergent-controlled.ll (+63)
(added) llvm/test/Transforms/SimpleLoopUnswitch/convergent-controlled.ll (+62)

diff --git a/clang/lib/CodeGen/CGExprAgg.cpp b/clang/lib/CodeGen/CGExprAgg.cpp
index 3a4291719da74..d3dc1014471ec 100644
--- a/clang/lib/CodeGen/CGExprAgg.cpp
+++ b/clang/lib/CodeGen/CGExprAgg.cpp
@@ -715,6 +715,9 @@ void AggExprEmitter::EmitArrayInit(Address DestPtr, llvm::ArrayType *AType,
         Builder.CreatePHI(element->getType(), 2, "arrayinit.cur");
     currentElement->addIncoming(element, entryBB);
 
+    if (CGF.CGM.shouldEmitConvergenceTokens())
+      CGF.ConvergenceTokenStack.push_back(CGF.emitConvergenceLoopToken(bodyBB));
+
     // Emit the actual filler expression.
     {
       // C++1z [class.temporary]p5:
@@ -746,6 +749,9 @@ void AggExprEmitter::EmitArrayInit(Address DestPtr, llvm::ArrayType *AType,
     Builder.CreateCondBr(done, endBB, bodyBB);
     currentElement->addIncoming(nextElement, Builder.GetInsertBlock());
 
+    if (CGF.CGM.shouldEmitConvergenceTokens())
+      CGF.ConvergenceTokenStack.pop_back();
+
     CGF.EmitBlock(endBB);
   }
 }
@@ -1987,6 +1993,9 @@ void AggExprEmitter::VisitArrayInitLoopExpr(const ArrayInitLoopExpr *E,
   llvm::Value *element =
       Builder.CreateInBoundsGEP(llvmElementType, begin, index);
 
+  if (CGF.CGM.shouldEmitConvergenceTokens())
+    CGF.ConvergenceTokenStack.push_back(CGF.emitConvergenceLoopToken(bodyBB));
+
   // Prepare for a cleanup.
   QualType::DestructionKind dtorKind = elementType.isDestructedType();
   EHScopeStack::stable_iterator cleanup;
@@ -2034,6 +2043,9 @@ void AggExprEmitter::VisitArrayInitLoopExpr(const ArrayInitLoopExpr *E,
   llvm::BasicBlock *endBB = CGF.createBasicBlock("arrayinit.end");
   Builder.CreateCondBr(done, endBB, bodyBB);
 
+  if (CGF.CGM.shouldEmitConvergenceTokens())
+    CGF.ConvergenceTokenStack.pop_back();
+
   CGF.EmitBlock(endBB);
 
   // Leave the partial-array cleanup if we entered one.
diff --git a/clang/lib/CodeGen/CGHLSLBuiltins.cpp b/clang/lib/CodeGen/CGHLSLBuiltins.cpp
index 29c41893bdbc4..0ef0fa8630f21 100644
--- a/clang/lib/CodeGen/CGHLSLBuiltins.cpp
+++ b/clang/lib/CodeGen/CGHLSLBuiltins.cpp
@@ -543,9 +543,13 @@ Value *CodeGenFunction::EmitHLSLBuiltinExpr(unsigned BuiltinID,
     Value *IndexOp = EmitScalarExpr(E->getArg(1));
 
     llvm::Type *RetTy = ConvertType(E->getType());
-    return Builder.CreateIntrinsic(
-        RetTy, CGM.getHLSLRuntime().getCreateResourceGetPointerIntrinsic(),
-        ArrayRef<Value *>{HandleOp, IndexOp});
+    llvm::Function *IntrFn = llvm::Intrinsic::getOrInsertDeclaration(
+        &CGM.getModule(),
+        CGM.getHLSLRuntime().getCreateResourceGetPointerIntrinsic(),
+        {RetTy, HandleOp->getType(), IndexOp->getType()});
+    llvm::CallInst *CI = EmitRuntimeCall(IntrFn, {HandleOp, IndexOp});
+    CI->setCallingConv(IntrFn->getCallingConv());
+    return CI;
   }
   case Builtin::BI__builtin_hlsl_resource_sample: {
     Value *HandleOp = EmitScalarExpr(E->getArg(0));
diff --git a/clang/lib/CodeGen/CGHLSLRuntime.cpp b/clang/lib/CodeGen/CGHLSLRuntime.cpp
index 4e6f853890c83..2feb69668d87c 100644
--- a/clang/lib/CodeGen/CGHLSLRuntime.cpp
+++ b/clang/lib/CodeGen/CGHLSLRuntime.cpp
@@ -657,8 +657,16 @@ CGHLSLRuntime::emitDXILUserSemanticLoad(llvm::IRBuilder<> &B, llvm::Type *Type,
                             llvm::PoisonValue::get(B.getInt32Ty())};
 
   llvm::Intrinsic::ID IntrinsicID = llvm::Intrinsic::dx_load_input;
-  llvm::Value *Value = B.CreateIntrinsic(/*ReturnType=*/Type, IntrinsicID, Args,
-                                         nullptr, VariableName);
+
+  SmallVector<OperandBundleDef, 1> OB;
+  if (auto *Token = getConvergenceToken(*B.GetInsertBlock())) {
+    llvm::Value *bundleArgs[] = {Token};
+    OB.emplace_back("convergencectrl", bundleArgs);
+  }
+
+  llvm::Function *IntrFn = llvm::Intrinsic::getOrInsertDeclaration(
+      B.GetInsertBlock()->getModule(), IntrinsicID, {Type});
+  llvm::Value *Value = B.CreateCall(IntrFn, Args, OB, VariableName);
   return Value;
 }
 
@@ -676,7 +684,16 @@ void CGHLSLRuntime::emitDXILUserSemanticStore(llvm::IRBuilder<> &B,
                             Source};
 
   llvm::Intrinsic::ID IntrinsicID = llvm::Intrinsic::dx_store_output;
-  B.CreateIntrinsic(/*ReturnType=*/CGM.VoidTy, IntrinsicID, Args, nullptr);
+
+  SmallVector<OperandBundleDef, 1> OB;
+  if (auto *Token = getConvergenceToken(*B.GetInsertBlock())) {
+    llvm::Value *bundleArgs[] = {Token};
+    OB.emplace_back("convergencectrl", bundleArgs);
+  }
+
+  llvm::Function *IntrFn = llvm::Intrinsic::getOrInsertDeclaration(
+      B.GetInsertBlock()->getModule(), IntrinsicID, {Source->getType()});
+  B.CreateCall(IntrFn, Args, OB);
 }
 
 llvm::Value *CGHLSLRuntime::emitUserSemanticLoad(
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index 0ff93d2ce7363..26c1e6ee00b57 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -5421,11 +5421,11 @@ class CodeGenFunction : public CodeGenTypeCache {
   void maybeAttachRangeForLoad(llvm::LoadInst *Load, QualType Ty,
                                SourceLocation Loc);
 
-private:
   // Emits a convergence_loop instruction for the given |BB|, with |ParentToken|
   // as it's parent convergence instr.
   llvm::ConvergenceControlInst *emitConvergenceLoopToken(llvm::BasicBlock *BB);
 
+private:
   // Adds a convergence_ctrl token with |ParentToken| as parent convergence
   // instr to the call |Input|.
   llvm::CallBase *addConvergenceControlToken(llvm::CallBase *Input);
diff --git a/clang/lib/CodeGen/CodeGenModule.h b/clang/lib/CodeGen/CodeGenModule.h
index 0a697c84b66a7..bfd879f21d8b6 100644
--- a/clang/lib/CodeGen/CodeGenModule.h
+++ b/clang/lib/CodeGen/CodeGenModule.h
@@ -1811,7 +1811,7 @@ class CodeGenModule : public CodeGenTypeCache {
   bool shouldEmitConvergenceTokens() const {
     // TODO: this should probably become unconditional once the controlled
     // convergence becomes the norm.
-    return getTriple().isSPIRVLogical();
+    return getTriple().isSPIRVLogical() || getTriple().isDXIL();
   }
 
   void addUndefinedGlobalForTailCall(
diff --git a/clang/test/CodeGenDirectX/Builtins/dot2add.c b/clang/test/CodeGenDirectX/Builtins/dot2add.c
index 4275a285012b0..bc5073995522e 100644
--- a/clang/test/CodeGenDirectX/Builtins/dot2add.c
+++ b/clang/test/CodeGenDirectX/Builtins/dot2add.c
@@ -8,6 +8,7 @@ typedef half half2 __attribute__((ext_vector_type(2)));
 // CHECK-LABEL: define float @test_dot2add(
 // CHECK-SAME: <2 x half> noundef [[X:%.*]], <2 x half> noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0:[0-9]+]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[X_ADDR:%.*]] = alloca <2 x half>, align 2
 // CHECK-NEXT:    [[Y_ADDR:%.*]] = alloca <2 x half>, align 2
 // CHECK-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
diff --git a/clang/test/CodeGenHLSL/BasicFeatures/ArrayReturn.hlsl b/clang/test/CodeGenHLSL/BasicFeatures/ArrayReturn.hlsl
index 832c4ac9b10f5..b4235eed318e4 100644
--- a/clang/test/CodeGenHLSL/BasicFeatures/ArrayReturn.hlsl
+++ b/clang/test/CodeGenHLSL/BasicFeatures/ArrayReturn.hlsl
@@ -3,12 +3,14 @@
 typedef int Foo[2];
 
 // CHECK-LABEL: define void {{.*}}boop{{.*}}(ptr dead_on_unwind noalias writable sret([2 x i32]) align 4 %agg.result)
+// CHECK:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK: [[G:%.*]] = alloca [2 x i32], align 4
 // CHECK-NEXT: call void @llvm.memcpy.p0.p0.i32(ptr align 4 [[G]], ptr align 4 {{.*}}, i32 8, i1 false)
 // CHECK-NEXT: [[AIB:%.*]] = getelementptr inbounds [2 x i32], ptr %agg.result, i32 0, i32 0
 // CHECK-NEXT: br label %arrayinit.body
 // CHECK: arrayinit.body:
 // CHECK-NEXT: [[AII:%.*]] = phi i32 [ 0, %entry ], [ %arrayinit.next, %arrayinit.body ]
+// CHECK-NEXT: %[[#CV_LOOP:]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %[[#C_ENTRY]]) ]
 // CHECK-NEXT: [[X:%.*]] = getelementptr inbounds i32, ptr [[AIB]], i32 [[AII]]
 // CHECK-NEXT: [[AI:%.*]] = getelementptr inbounds nuw [2 x i32], ptr [[G]], i32 0, i32 [[AII]]
 // CHECK-NEXT: [[Y:%.*]] = load i32, ptr [[AI]], align 4
diff --git a/clang/test/CodeGenHLSL/BasicFeatures/InitLists.hlsl b/clang/test/CodeGenHLSL/BasicFeatures/InitLists.hlsl
index 3d7b8a906cdae..b34fd190a057c 100644
--- a/clang/test/CodeGenHLSL/BasicFeatures/InitLists.hlsl
+++ b/clang/test/CodeGenHLSL/BasicFeatures/InitLists.hlsl
@@ -66,6 +66,7 @@ struct UnnamedDerived : UnnamedOnly {};
 // CHECK-LABEL: define hidden void @_Z5case1v(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]]) #[[ATTR0:[0-9]+]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i32(ptr align 1 [[AGG_RESULT]], ptr align 1 @__const._Z5case1v.TF1, i32 8, i1 false)
 // CHECK-NEXT:    ret void
 //
@@ -78,6 +79,7 @@ TwoFloats case1() {
 // CHECK-LABEL: define hidden void @_Z5case2v(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i32(ptr align 1 [[AGG_RESULT]], ptr align 1 @__const._Z5case2v.TF2, i32 8, i1 false)
 // CHECK-NEXT:    ret void
 //
@@ -90,6 +92,7 @@ TwoFloats case2() {
 // CHECK-LABEL: define hidden void @_Z5case3i(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]], i32 noundef [[VAL:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[VAL_ADDR:%.*]] = alloca i32, align 4
 // CHECK-NEXT:    store i32 [[VAL]], ptr [[VAL_ADDR]], align 4
 // CHECK-NEXT:    [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOFLOATS]], ptr [[AGG_RESULT]], i32 0, i32 0
@@ -110,6 +113,7 @@ TwoFloats case3(int Val) {
 // CHECK-LABEL: define hidden void @_Z5case4Dv2_i(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]], <2 x i32> noundef [[TWOVALS:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[TWOVALS_ADDR:%.*]] = alloca <2 x i32>, align 4
 // CHECK-NEXT:    store <2 x i32> [[TWOVALS]], ptr [[TWOVALS_ADDR]], align 4
 // CHECK-NEXT:    [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOFLOATS]], ptr [[AGG_RESULT]], i32 0, i32 0
@@ -133,6 +137,7 @@ TwoFloats case4(int2 TwoVals) {
 // CHECK-LABEL: define hidden void @_Z5case5Dv2_i(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOINTS:%.*]]) align 1 [[AGG_RESULT:%.*]], <2 x i32> noundef [[TWOVALS:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[TWOVALS_ADDR:%.*]] = alloca <2 x i32>, align 4
 // CHECK-NEXT:    store <2 x i32> [[TWOVALS]], ptr [[TWOVALS_ADDR]], align 4
 // CHECK-NEXT:    [[Z:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOINTS]], ptr [[AGG_RESULT]], i32 0, i32 0
@@ -155,6 +160,7 @@ TwoInts case5(int2 TwoVals) {
 // CHECK-LABEL: define hidden void @_Z5case69TwoFloats(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOINTS:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noundef byval([[STRUCT_TWOFLOATS:%.*]]) align 1 [[TF4:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[Z:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOINTS]], ptr [[AGG_RESULT]], i32 0, i32 0
 // CHECK-NEXT:    [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOFLOATS]], ptr [[TF4]], i32 0, i32 0
 // CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[X]], align 1
@@ -177,6 +183,7 @@ TwoInts case6(TwoFloats TF4) {
 // CHECK-LABEL: define hidden void @_Z5case77TwoIntsS_i9TwoFloatsS0_S0_S0_(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_DOGGO:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noundef byval([[STRUCT_TWOINTS:%.*]]) align 1 [[TI1:%.*]], ptr noundef byval([[STRUCT_TWOINTS]]) align 1 [[TI2:%.*]], i32 noundef [[VAL:%.*]], ptr noundef byval([[STRUCT_TWOFLOATS:%.*]]) align 1 [[TF1:%.*]], ptr noundef byval([[STRUCT_TWOFLOATS]]) align 1 [[TF2:%.*]], ptr noundef byval([[STRUCT_TWOFLOATS]]) align 1 [[TF3:%.*]], ptr noundef byval([[STRUCT_TWOFLOATS]]) align 1 [[TF4:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[VAL_ADDR:%.*]] = alloca i32, align 4
 // CHECK-NEXT:    store i32 [[VAL]], ptr [[VAL_ADDR]], align 4
 // CHECK-NEXT:    [[LEGSTATE:%.*]] = getelementptr inbounds nuw [[STRUCT_DOGGO]], ptr [[AGG_RESULT]], i32 0, i32 0
@@ -241,6 +248,7 @@ Doggo case7(TwoInts TI1, TwoInts TI2, int Val, TwoFloats TF1, TwoFloats TF2,
 // CHECK-LABEL: define hidden void @_Z5case85Doggo(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_ANIMALBITS:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noundef byval([[STRUCT_DOGGO:%.*]]) align 1 [[D1:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[LEGS:%.*]] = getelementptr inbounds nuw [[STRUCT_ANIMALBITS]], ptr [[AGG_RESULT]], i32 0, i32 0
 // CHECK-NEXT:    [[LEGSTATE:%.*]] = getelementptr inbounds nuw [[STRUCT_DOGGO]], ptr [[D1]], i32 0, i32 0
 // CHECK-NEXT:    [[TMP0:%.*]] = load <4 x i32>, ptr [[LEGSTATE]], align 1
@@ -327,6 +335,7 @@ AnimalBits case8(Doggo D1) {
 // CHECK-LABEL: define hidden void @_Z5case95Doggo10AnimalBits(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_ZOO:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noundef byval([[STRUCT_DOGGO:%.*]]) align 1 [[D1:%.*]], ptr noundef byval([[STRUCT_ANIMALBITS:%.*]]) align 1 [[A1:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[DOGS:%.*]] = getelementptr inbounds nuw [[STRUCT_ZOO]], ptr [[AGG_RESULT]], i32 0, i32 0
 // CHECK-NEXT:    [[LEGSTATE:%.*]] = getelementptr inbounds nuw [[STRUCT_DOGGO]], ptr [[DOGS]], i32 0, i32 0
 // CHECK-NEXT:    [[LEGSTATE1:%.*]] = getelementptr inbounds nuw [[STRUCT_DOGGO]], ptr [[D1]], i32 0, i32 0
@@ -743,6 +752,7 @@ Zoo case9(Doggo D1, AnimalBits A1) {
 // CHECK-LABEL: define hidden void @_Z6case109TwoFloatsS_(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_FOURFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noundef byval([[STRUCT_TWOFLOATS:%.*]]) align 1 [[TF1:%.*]], ptr noundef byval([[STRUCT_TWOFLOATS]]) align 1 [[TF2:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOFLOATS]], ptr [[AGG_RESULT]], i32 0, i32 0
 // CHECK-NEXT:    [[X1:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOFLOATS]], ptr [[TF1]], i32 0, i32 0
 // CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[X1]], align 1
@@ -770,6 +780,7 @@ FourFloats case10(TwoFloats TF1, TwoFloats TF2) {
 // CHECK-LABEL: define hidden void @_Z6case11f(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_FOURFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]], float noundef nofpclass(nan inf) [[F:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
 // CHECK-NEXT:    [[REF_TMP:%.*]] = alloca <4 x float>, align 4
 // CHECK-NEXT:    [[REF_TMP1:%.*]] = alloca <4 x float>, align 4
@@ -819,6 +830,7 @@ FourFloats case11(float F) {
 // CHECK-LABEL: define hidden void @_Z6case12ii(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_SLICYBITS:%.*]]) align 1 [[AGG_RESULT:%.*]], i32 noundef [[I:%.*]], i32 noundef [[J:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[I_ADDR:%.*]] = alloca i32, align 4
 // CHECK-NEXT:    [[J_ADDR:%.*]] = alloca i32, align 4
 // CHECK-NEXT:    store i32 [[I]], ptr [[I_ADDR]], align 4
@@ -841,6 +853,7 @@ SlicyBits case12(int I, int J) {
 // CHECK-LABEL: define hidden void @_Z6case137TwoInts(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_SLICYBITS:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noundef byval([[STRUCT_TWOINTS:%.*]]) align 1 [[TI:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[Z:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOINTS]], ptr [[TI]], i32 0, i32 0
 // CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[Z]], align 1
 // CHECK-NEXT:    [[TMP1:%.*]] = trunc i32 [[TMP0]] to i8
@@ -861,6 +874,7 @@ SlicyBits case13(TwoInts TI) {
 // CHECK-LABEL: define hidden void @_Z6case149SlicyBits(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOINTS:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noundef byval([[STRUCT_SLICYBITS:%.*]]) align 1 [[SB:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[Z:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOINTS]], ptr [[AGG_RESULT]], i32 0, i32 0
 // CHECK-NEXT:    [[BF_LOAD:%.*]] = load i8, ptr [[SB]], align 1
 // CHECK-NEXT:    [[BF_CAST:%.*]] = sext i8 [[BF_LOAD]] to i32
@@ -881,6 +895,7 @@ TwoInts case14(SlicyBits SB) {
 // CHECK-LABEL: define hidden void @_Z6case159SlicyBits(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noundef byval([[STRUCT_SLICYBITS:%.*]]) align 1 [[SB:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOFLOATS]], ptr [[AGG_RESULT]], i32 0, i32 0
 // CHECK-NEXT:    [[BF_LOAD:%.*]] = load i8, ptr [[SB]], align 1
 // CHECK-NEXT:    [[BF_CAST:%.*]] = sext i8 [[BF_LOAD]] to i32
@@ -904,6 +919,7 @@ TwoFloats case15(SlicyBits SB) {
 // CHECK-LABEL: define hidden void @_Z7makeTwoRf(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noalias noundef nonnull align 4 dereferenceable(4) [[X:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[X_ADDR:%.*]] = alloca ptr, align 4
 // CHECK-NEXT:    store ptr [[X]], ptr [[X_ADDR]], align 4
 // CHECK-NEXT:    [[X1:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOFLOATS]], ptr [[AGG_RESULT]], i32 0, i32 0
@@ -930,6 +946,7 @@ TwoFloats makeTwo(inout float X) {
 // CHECK-LABEL: define hidden void @_Z6case16v(
 // CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_FOURFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[X:%.*]] = alloca float, align 4
 // CHECK-NEXT:    [[REF_TMP:%.*]] = alloca [[STRUCT_TWOFLOATS:%.*]], align 1
 // CHECK-NEXT:    [[TMP:%.*]] = alloca float, align 4
@@ -963,6 +980,7 @@ FourFloats case16() {
 // CHECK-LABEL: define hidden noundef i32 @_Z12case17Helperi(
 // CHECK-SAME: i32 noundef [[X:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[X_ADDR:%.*]] = alloca i32, align 4
 // CHECK-NEXT:    store i32 [[X]], ptr [[X_ADDR]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[X_ADDR]], align 4
@@ -976,6 +994,7 @@ int case17Helper(int x) {
 // CHECK-LABEL: define hidden void @_Z6case17v(
 // CHECK-SAME: ) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
 // CHECK-NEXT:    [[X:%.*]] = alloca <2 x i32>, align 4
 // CHECK-NEXT:    [[CALL:%.*]] = call noundef i32 @_Z12case17Helperi(i32 noundef 0) #[[ATTR2]]
 // CHECK-NEXT:    [[CALL1:%.*]] = call noundef i32 @_Z12case17Helperi(i32 nou...
[truncated]

Icohedron

LGTM

bob80905

Looks good to my untrained eye, would recommend updating the new test descriptions and maybe even being a bit verbose with them.

The `SPIRVStripConvergenceIntrinsic` pass was written as a spirv pass as it is the currently the only target that emits convergence tokens during codegen. There is nothing target specific to the pass, and, we plan to emit convergence tokens when targeting DirectX (and all targets in general), so move the pass to a common place. The previous pass used temporary `Undef`s, as part of moving the pass we can simply reverse the traverse order to remove the use of `Undef` as it is deprecated. Enables the pass for targeting DirectX and is a pre-req for: #188792. Assisted by: Github Copilot

inbelic · 2026-04-16T16:25:47Z

@Keenuts I will merge this tmrw after I confirm the hlsl test suite is passing as expected. Let me know if I should hold until you get a chance to review. Thanks

…ls (#188537) The `SPIRVStripConvergenceIntrinsic` pass was written as a spirv pass as it is the currently the only target that emits convergence tokens during codegen. There is nothing target specific to the pass, and, we plan to emit convergence tokens when targeting DirectX (and all targets in general), so move the pass to a common place. The previous pass used temporary `Undef`s, as part of moving the pass we can simply reverse the traverse order to remove the use of `Undef` as it is deprecated. Enables the pass for targeting DirectX and is a pre-req for: llvm/llvm-project#188792. Assisted by: Github Copilot

github-actions · 2026-04-16T16:59:18Z

🪟 Windows x64 Test Results

135355 tests passed
4455 tests skipped

✅ The build succeeded and all tests passed.

…ls (#188537) The `SPIRVStripConvergenceIntrinsic` pass was written as a spirv pass as it is the currently the only target that emits convergence tokens during codegen. There is nothing target specific to the pass, and, we plan to emit convergence tokens when targeting DirectX (and all targets in general), so move the pass to a common place. The previous pass used temporary `Undef`s, as part of moving the pass we can simply reverse the traverse order to remove the use of `Undef` as it is deprecated. Enables the pass for targeting DirectX and is a pre-req for: llvm/llvm-project#188792. Assisted by: Github Copilot

These changes were introduced between pr open and now. Fixing them in the same manner as before

hekota

Just a few nits in the test changes.

hekota · 2026-04-16T19:15:19Z

 Texture2D<float4> t;

-// CHECK: define internal {{.*}} <4 x float> @test_mips(float vector[2])(<2 x float> {{.*}} %loc) #1 {
+// CHECK: define internal {{.*}} <4 x float> @test_mips(float vector[2])(<2 x float> {{.*}} %loc) #2 {


Suggested change

// CHECK: define internal {{.*}} <4 x float> @test_mips(float vector[2])(<2 x float> {{.*}} %loc) #2 {

// CHECK: define internal {{.*}} <4 x float> @test_mips(float vector[2])(<2 x float> {{.*}} %loc)

hekota · 2026-04-16T19:16:30Z

+// CHECK: %hlsl.f32tof16 = call i32 @llvm.dx.legacyf32tof16.f32(float %[[#]])
 // CHECK: ret i32 %hlsl.f32tof16
-// CHECK: declare i32 @llvm.dx.legacyf32tof16.f32(float) #1
+// CHECK: declare i32 @llvm.dx.legacyf32tof16.f32(float) #2


Suggested change

// CHECK: declare i32 @llvm.dx.legacyf32tof16.f32(float) #2

// CHECK: declare i32 @llvm.dx.legacyf32tof16.f32(float)

The function attributes are not important and can be removed. This applies to all of the other files you needed to update that have the same pattern.

hekota · 2026-04-16T19:32:43Z

+  // CHECK:       %call1 = call noundef i32 @_ZN4Pair8getFirstEv(ptr noundef nonnull align 1 dereferenceable(8) %Vals) #{{[0-9]+}} [ "convergencectrl"(token %[[#C_ENTRY]]) ]
  // CHECK-NEXT:  %First = getelementptr inbounds nuw %struct.Pair, ptr %Vals, i32 0, i32 0
-  // CHECK-NEXT:  store i32 %call, ptr %First, align 1
-  // CHECK-NEXT:  %call1 = call reassoc nnan ninf nsz arcp afn noundef nofpclass(nan inf) float @_ZN4Pair9getSecondEv(ptr noundef nonnull align 1 dereferenceable(8) %Vals)
+  // CHECK-NEXT:  store i32 %call1, ptr %First, align 1
+  // CHECK-NEXT:  %call2 = call reassoc nnan ninf nsz arcp afn noundef nofpclass(nan inf) float @_ZN4Pair9getSecondEv(ptr noundef nonnull align 1 dereferenceable(8) %Vals) #{{[0-9]+}} [ "convergencectrl"(token %[[#C_ENTRY]]) ]


This could use a regex for call1 and call2.

…8537) The `SPIRVStripConvergenceIntrinsic` pass was written as a spirv pass as it is the currently the only target that emits convergence tokens during codegen. There is nothing target specific to the pass, and, we plan to emit convergence tokens when targeting DirectX (and all targets in general), so move the pass to a common place. The previous pass used temporary `Undef`s, as part of moving the pass we can simply reverse the traverse order to remove the use of `Undef` as it is deprecated. Enables the pass for targeting DirectX and is a pre-req for: llvm#188792. Assisted by: Github Copilot

…g DirectX" (#193090) This change appears to introduce complications when trying to do a full loop unroll that is exhibited here: https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618. This results in invalid DXIL as the unreachable branch is not correctly cleaned up. Initial leads look like this is because the instructions with convergence control tokens are still being used for analysis when they are within an unreachable branch. Reverts #188792

…en targeting DirectX" (#193090) This change appears to introduce complications when trying to do a full loop unroll that is exhibited here: https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618. This results in invalid DXIL as the unreachable branch is not correctly cleaned up. Initial leads look like this is because the instructions with convergence control tokens are still being used for analysis when they are within an unreachable branch. Reverts llvm/llvm-project#188792

llvm#188792) This pr allows codegen to generate convergence control tokens. This allows for a more accurate description of convergence behaviour to prevent (or allow) invalid control flow graph transforms. As noted, the use of convergence control tokens is the ideal norm and this follows that by enabling it for `DirectX`. This was done now under the precedent of preventing a convergent exit condition of a loop from being illegally moved across control flow. Test cases for this are explicitly added. Please see the individual commits for logically similar chunks. Unfortunately, it is tricky to stage this in smaller individual commits. Resolves llvm#180621. llvm#188537 is a pre-requisite of this passing HLSL offload suite tests. Assisted by: Github Copilot

@inbelic

…rge attempt) (#193584) Any expression that accesses a resource or resource array member of a global struct instance must be during codegen replaced by an access of the corresponding implicit global resource variable. When codegen encounters a `MemberExpr` of a resource type, it traverses the AST to locate the parent struct declaration, building the expected global resource variable name along the way. If the parent declaration is a non-static global struct instance, codegen searches its `HLSLAssociatedResourceDeclAttr` attributes to locate the matching global resource variable and then generates IR code to access the resource global in place of the member access. Fixes #182989 This is the second try to land this. The [first one](#187127 with #188792 and both PRs had to be reverted. No updates needed to this change. I synced with @inbelic and we agreed that this one should go in first.

…ls (#188537) The `SPIRVStripConvergenceIntrinsic` pass was written as a spirv pass as it is the currently the only target that emits convergence tokens during codegen. There is nothing target specific to the pass, and, we plan to emit convergence tokens when targeting DirectX (and all targets in general), so move the pass to a common place. The previous pass used temporary `Undef`s, as part of moving the pass we can simply reverse the traverse order to remove the use of `Undef` as it is deprecated. Enables the pass for targeting DirectX and is a pre-req for: llvm/llvm-project#188792. Assisted by: Github Copilot

…en targeting DirectX" (#193090) This change appears to introduce complications when trying to do a full loop unroll that is exhibited here: https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618. This results in invalid DXIL as the unreachable branch is not correctly cleaned up. Initial leads look like this is because the instructions with convergence control tokens are still being used for analysis when they are within an unreachable branch. Reverts llvm/llvm-project#188792

…ly` and `IntrReadMem` (#193593) `IntrConvergent` was originally added to `dx.resource.getpointer` to prevent optimization passes (`SimplifyCFG`, `GVN`) from sinking the intrinsic out of control flow branches, which would create phi nodes on the returned pointer. Using `IntrInaccessibleMemOnly` and `IntrReadMem` semantics still prevent passes from merging or sinking identical calls across branches. However, this allows the call to be moved within a single control flow path. Updates relevant tests and adds a new test to demonstrate a now legal potential optimization. This was discovered when #188792 caused the following failure: https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618. When emitting convergence control tokens, each resource access is then a user of the convergence control tokens, which makes it's use more unnecessarily restrictive for optimizations and in this case would prevent a loop unroll from taking place. Assisted by: Claude Opus 4.6

…ssibleMemOnly` and `IntrReadMem` (#193593) `IntrConvergent` was originally added to `dx.resource.getpointer` to prevent optimization passes (`SimplifyCFG`, `GVN`) from sinking the intrinsic out of control flow branches, which would create phi nodes on the returned pointer. Using `IntrInaccessibleMemOnly` and `IntrReadMem` semantics still prevent passes from merging or sinking identical calls across branches. However, this allows the call to be moved within a single control flow path. Updates relevant tests and adds a new test to demonstrate a now legal potential optimization. This was discovered when llvm/llvm-project#188792 caused the following failure: https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618. When emitting convergence control tokens, each resource access is then a user of the convergence control tokens, which makes it's use more unnecessarily restrictive for optimizations and in this case would prevent a loop unroll from taking place. Assisted by: Claude Opus 4.6

@inbelic

…rge attempt) (llvm#193584) Any expression that accesses a resource or resource array member of a global struct instance must be during codegen replaced by an access of the corresponding implicit global resource variable. When codegen encounters a `MemberExpr` of a resource type, it traverses the AST to locate the parent struct declaration, building the expected global resource variable name along the way. If the parent declaration is a non-static global struct instance, codegen searches its `HLSLAssociatedResourceDeclAttr` attributes to locate the matching global resource variable and then generates IR code to access the resource global in place of the member access. Fixes llvm#182989 This is the second try to land this. The [first one](llvm#187127 with llvm#188792 and both PRs had to be reverted. No updates needed to this change. I synced with @inbelic and we agreed that this one should go in first.

…ly` and `IntrReadMem` (llvm#193593) `IntrConvergent` was originally added to `dx.resource.getpointer` to prevent optimization passes (`SimplifyCFG`, `GVN`) from sinking the intrinsic out of control flow branches, which would create phi nodes on the returned pointer. Using `IntrInaccessibleMemOnly` and `IntrReadMem` semantics still prevent passes from merging or sinking identical calls across branches. However, this allows the call to be moved within a single control flow path. Updates relevant tests and adds a new test to demonstrate a now legal potential optimization. This was discovered when llvm#188792 caused the following failure: https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618. When emitting convergence control tokens, each resource access is then a user of the convergence control tokens, which makes it's use more unnecessarily restrictive for optimizations and in this case would prevent a loop unroll from taking place. Assisted by: Claude Opus 4.6

llvm#188792) This pr allows codegen to generate convergence control tokens. This allows for a more accurate description of convergence behaviour to prevent (or allow) invalid control flow graph transforms. As noted, the use of convergence control tokens is the ideal norm and this follows that by enabling it for `DirectX`. This was done now under the precedent of preventing a convergent exit condition of a loop from being illegally moved across control flow. Test cases for this are explicitly added. Please see the individual commits for logically similar chunks. Unfortunately, it is tricky to stage this in smaller individual commits. Resolves llvm#180621. llvm#188537 is a pre-requisite of this passing HLSL offload suite tests. Assisted by: Github Copilot

…g DirectX" (llvm#193090) This change appears to introduce complications when trying to do a full loop unroll that is exhibited here: https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618. This results in invalid DXIL as the unreachable branch is not correctly cleaned up. Initial leads look like this is because the instructions with convergence control tokens are still being used for analysis when they are within an unreachable branch. Reverts llvm#188792

@inbelic

…rge attempt) (llvm#193584) Any expression that accesses a resource or resource array member of a global struct instance must be during codegen replaced by an access of the corresponding implicit global resource variable. When codegen encounters a `MemberExpr` of a resource type, it traverses the AST to locate the parent struct declaration, building the expected global resource variable name along the way. If the parent declaration is a non-static global struct instance, codegen searches its `HLSLAssociatedResourceDeclAttr` attributes to locate the matching global resource variable and then generates IR code to access the resource global in place of the member access. Fixes llvm#182989 This is the second try to land this. The [first one](llvm#187127 with llvm#188792 and both PRs had to be reverted. No updates needed to this change. I synced with @inbelic and we agreed that this one should go in first.

…ly` and `IntrReadMem` (llvm#193593) `IntrConvergent` was originally added to `dx.resource.getpointer` to prevent optimization passes (`SimplifyCFG`, `GVN`) from sinking the intrinsic out of control flow branches, which would create phi nodes on the returned pointer. Using `IntrInaccessibleMemOnly` and `IntrReadMem` semantics still prevent passes from merging or sinking identical calls across branches. However, this allows the call to be moved within a single control flow path. Updates relevant tests and adds a new test to demonstrate a now legal potential optimization. This was discovered when llvm#188792 caused the following failure: https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618. When emitting convergence control tokens, each resource access is then a user of the convergence control tokens, which makes it's use more unnecessarily restrictive for optimizations and in this case would prevent a loop unroll from taking place. Assisted by: Claude Opus 4.6

…g DirectX" (#194452) The initial landing surfaced 3 somewhat orthogonal issues related to loop unrolling. These are addressed: [here](#193592), [here](#193593) and [here](#193590). These caused these [tests](https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618#step:8:87913) to fail in the offload test suite. We can verify that these are now passing as expected (fixing any of the 3 issues would resolve this and allow us to reland) Some additional tests were added since the revert that are now accounted for and updated in the reland fixes commit. This relands #188792

…en targeting DirectX" (#194452) The initial landing surfaced 3 somewhat orthogonal issues related to loop unrolling. These are addressed: [here](llvm/llvm-project#193592), [here](llvm/llvm-project#193593) and [here](llvm/llvm-project#193590). These caused these [tests](https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618#step:8:87913) to fail in the offload test suite. We can verify that these are now passing as expected (fixing any of the 3 issues would resolve this and allow us to reland) Some additional tests were added since the revert that are now accounted for and updated in the reland fixes commit. This relands llvm/llvm-project#188792

inbelic added 9 commits March 26, 2026 15:56

[HLSL] Enable convergence tokens

1df295d

[HLSL] Emit convergence loop tokens for array init

406f64f

This follows the previous correction made in llvm#140120

[HLSL] Consolidate types of convergence

a21ac59

When emitting intrinsics marked as convergent we need to annotate them with the correct convergence tokens

[Tests] Ensure loop exit condition isn't hoisted

9abbdff

[Tests] Ensure convergent token behaviour is consistent across dx/spirv

ce30819

[Tests] Ensure loop convergent token is correctly generated for array…

f3d3401

… init

[Tests] Update CHECK-NEXT tests with entry.convergence call

bac680c

This commit fixes tests that use a CHECK-NEXT by inserting a CHECK-NEXT of the newly add intrinsic call

[Tests] Remove hard-coded register checks

4d571c5

These files were checking on hard-coded ssa register names. This make them inflexible to any code gen updates (like this change). Update the checks to match in a flexible manner

clang-format

4b64c47

inbelic commented Mar 26, 2026

View reviewed changes

inbelic mentioned this pull request Mar 26, 2026

[NFC][SPIRV] Move SPIRVStripConvergenceIntrinsics to Utils #188537

Merged

inbelic marked this pull request as ready for review March 26, 2026 19:56

inbelic changed the title ~~[HLSL] Emit convergence control tokens when targeting DirectX~~ [HLSL][DirectX] Emit convergence control tokens when targeting DirectX Mar 26, 2026

llvmbot added clang:codegen IR generation bugs: mangling, exceptions, etc. backend:DirectX HLSL HLSL Language Support llvm:transforms labels Mar 26, 2026

s-perron requested a review from Keenuts March 27, 2026 00:46

Icohedron approved these changes Apr 8, 2026

View reviewed changes

bob80905 approved these changes Apr 13, 2026

View reviewed changes

Merge branch 'main' into inbelic/conv-ctrl

3dc2c3d

inbelic added 2 commits April 16, 2026 18:11

review: expand test comments for clarity

68e7737

correct build failures

f7a15ef

These changes were introduced between pr open and now. Fixing them in the same manner as before

hekota reviewed Apr 16, 2026

View reviewed changes

review: improve checks

dcd46e1

inbelic merged commit 2c8c2bd into llvm:main Apr 20, 2026
10 of 11 checks passed

inbelic mentioned this pull request Apr 20, 2026

Revert "[HLSL][DirectX] Emit convergence control tokens when targeting DirectX" #193090

Merged

hekota mentioned this pull request Apr 22, 2026

[HLSL] Add codegen for accessing resource members of a struct (2nd merge attempt) #193584

Merged

This was referenced Apr 22, 2026

[Analysis] Ignore convergence tokens in dead branches in CodeMetrics #193590

Open

[DirectX] Denote dx.resource.getpointer with IntrInaccessibleMemOnly and IntrReadMem #193593

Merged

bob80905 mentioned this pull request Apr 30, 2026

Reland "[HLSL][DirectX] Emit convergence control tokens when targeting DirectX" #194452

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HLSL][DirectX] Emit convergence control tokens when targeting DirectX#188792

[HLSL][DirectX] Emit convergence control tokens when targeting DirectX#188792
inbelic merged 13 commits into
llvm:mainfrom
inbelic:inbelic/conv-ctrl

inbelic commented Mar 26, 2026 •

edited

Loading

Uh oh!

inbelic Mar 26, 2026 •

edited

Loading

Uh oh!

llvmbot commented Mar 26, 2026 •

edited

Loading

Uh oh!

Icohedron left a comment

Uh oh!

bob80905 left a comment

Uh oh!

inbelic commented Apr 16, 2026

Uh oh!

github-actions Bot commented Apr 16, 2026 •

edited

Loading

Uh oh!

hekota left a comment

Uh oh!

hekota Apr 16, 2026

Uh oh!

hekota Apr 16, 2026

Uh oh!

hekota Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

	// CHECK: define internal {{.}} <4 x float> @test_mips(float vector[2])(<2 x float> {{.}} %loc) #2 {
	// CHECK: define internal {{.}} <4 x float> @test_mips(float vector[2])(<2 x float> {{.}} %loc)

	// CHECK: declare i32 @llvm.dx.legacyf32tof16.f32(float) #2
	// CHECK: declare i32 @llvm.dx.legacyf32tof16.f32(float)

Conversation

inbelic commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

inbelic Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

llvmbot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Icohedron left a comment

Choose a reason for hiding this comment

Uh oh!

bob80905 left a comment

Choose a reason for hiding this comment

Uh oh!

inbelic commented Apr 16, 2026

Uh oh!

github-actions Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🪟 Windows x64 Test Results

Uh oh!

hekota left a comment

Choose a reason for hiding this comment

Uh oh!

hekota Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

hekota Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

hekota Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

inbelic commented Mar 26, 2026 •

edited

Loading

inbelic Mar 26, 2026 •

edited

Loading

llvmbot commented Mar 26, 2026 •

edited

Loading

github-actions Bot commented Apr 16, 2026 •

edited

Loading