[HLSL][DirectX] Emit convergence control tokens when targeting DirectX#188792
Conversation
This follows the previous correction made in llvm#140120
When emitting intrinsics marked as convergent we need to annotate them with the correct convergence tokens
This commit fixes tests that use a CHECK-NEXT by inserting a CHECK-NEXT of the newly add intrinsic call
These files were checking on hard-coded ssa register names. This make them inflexible to any code gen updates (like this change). Update the checks to match in a flexible manner
| void maybeAttachRangeForLoad(llvm::LoadInst *Load, QualType Ty, | ||
| SourceLocation Loc); | ||
|
|
||
| private: |
There was a problem hiding this comment.
This was previously a public method but was marked private in a clean-up pr as it had no uses at the time and the same effect can be achieved with public members.
|
@llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-backend-directx Author: Finn Plummer (inbelic) ChangesThis pr allows codegen to generate convergence control tokens. This allows for a more accurate description of convergence behaviour to prevent (or allow) invalid control flow graph transforms. As noted, the use of convergence control tokens is the ideal norm and this follows that by enabling it for This was done now under the precedent of preventing a convergent exit condition of a loop from being illegally moved across control flow. Test cases for this are explicitly added. Please see the individual commits for logically similar chunks. Unfortunately, it is tricky to stage this in smaller individual commits. Resolves #180621. #188537 is a pre-requisite of this passing HLSL offload suite tests. Assisted by: Github Copilot Patch is 224.99 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/188792.diff 72 Files Affected:
diff --git a/clang/lib/CodeGen/CGExprAgg.cpp b/clang/lib/CodeGen/CGExprAgg.cpp
index 3a4291719da74..d3dc1014471ec 100644
--- a/clang/lib/CodeGen/CGExprAgg.cpp
+++ b/clang/lib/CodeGen/CGExprAgg.cpp
@@ -715,6 +715,9 @@ void AggExprEmitter::EmitArrayInit(Address DestPtr, llvm::ArrayType *AType,
Builder.CreatePHI(element->getType(), 2, "arrayinit.cur");
currentElement->addIncoming(element, entryBB);
+ if (CGF.CGM.shouldEmitConvergenceTokens())
+ CGF.ConvergenceTokenStack.push_back(CGF.emitConvergenceLoopToken(bodyBB));
+
// Emit the actual filler expression.
{
// C++1z [class.temporary]p5:
@@ -746,6 +749,9 @@ void AggExprEmitter::EmitArrayInit(Address DestPtr, llvm::ArrayType *AType,
Builder.CreateCondBr(done, endBB, bodyBB);
currentElement->addIncoming(nextElement, Builder.GetInsertBlock());
+ if (CGF.CGM.shouldEmitConvergenceTokens())
+ CGF.ConvergenceTokenStack.pop_back();
+
CGF.EmitBlock(endBB);
}
}
@@ -1987,6 +1993,9 @@ void AggExprEmitter::VisitArrayInitLoopExpr(const ArrayInitLoopExpr *E,
llvm::Value *element =
Builder.CreateInBoundsGEP(llvmElementType, begin, index);
+ if (CGF.CGM.shouldEmitConvergenceTokens())
+ CGF.ConvergenceTokenStack.push_back(CGF.emitConvergenceLoopToken(bodyBB));
+
// Prepare for a cleanup.
QualType::DestructionKind dtorKind = elementType.isDestructedType();
EHScopeStack::stable_iterator cleanup;
@@ -2034,6 +2043,9 @@ void AggExprEmitter::VisitArrayInitLoopExpr(const ArrayInitLoopExpr *E,
llvm::BasicBlock *endBB = CGF.createBasicBlock("arrayinit.end");
Builder.CreateCondBr(done, endBB, bodyBB);
+ if (CGF.CGM.shouldEmitConvergenceTokens())
+ CGF.ConvergenceTokenStack.pop_back();
+
CGF.EmitBlock(endBB);
// Leave the partial-array cleanup if we entered one.
diff --git a/clang/lib/CodeGen/CGHLSLBuiltins.cpp b/clang/lib/CodeGen/CGHLSLBuiltins.cpp
index 29c41893bdbc4..0ef0fa8630f21 100644
--- a/clang/lib/CodeGen/CGHLSLBuiltins.cpp
+++ b/clang/lib/CodeGen/CGHLSLBuiltins.cpp
@@ -543,9 +543,13 @@ Value *CodeGenFunction::EmitHLSLBuiltinExpr(unsigned BuiltinID,
Value *IndexOp = EmitScalarExpr(E->getArg(1));
llvm::Type *RetTy = ConvertType(E->getType());
- return Builder.CreateIntrinsic(
- RetTy, CGM.getHLSLRuntime().getCreateResourceGetPointerIntrinsic(),
- ArrayRef<Value *>{HandleOp, IndexOp});
+ llvm::Function *IntrFn = llvm::Intrinsic::getOrInsertDeclaration(
+ &CGM.getModule(),
+ CGM.getHLSLRuntime().getCreateResourceGetPointerIntrinsic(),
+ {RetTy, HandleOp->getType(), IndexOp->getType()});
+ llvm::CallInst *CI = EmitRuntimeCall(IntrFn, {HandleOp, IndexOp});
+ CI->setCallingConv(IntrFn->getCallingConv());
+ return CI;
}
case Builtin::BI__builtin_hlsl_resource_sample: {
Value *HandleOp = EmitScalarExpr(E->getArg(0));
diff --git a/clang/lib/CodeGen/CGHLSLRuntime.cpp b/clang/lib/CodeGen/CGHLSLRuntime.cpp
index 4e6f853890c83..2feb69668d87c 100644
--- a/clang/lib/CodeGen/CGHLSLRuntime.cpp
+++ b/clang/lib/CodeGen/CGHLSLRuntime.cpp
@@ -657,8 +657,16 @@ CGHLSLRuntime::emitDXILUserSemanticLoad(llvm::IRBuilder<> &B, llvm::Type *Type,
llvm::PoisonValue::get(B.getInt32Ty())};
llvm::Intrinsic::ID IntrinsicID = llvm::Intrinsic::dx_load_input;
- llvm::Value *Value = B.CreateIntrinsic(/*ReturnType=*/Type, IntrinsicID, Args,
- nullptr, VariableName);
+
+ SmallVector<OperandBundleDef, 1> OB;
+ if (auto *Token = getConvergenceToken(*B.GetInsertBlock())) {
+ llvm::Value *bundleArgs[] = {Token};
+ OB.emplace_back("convergencectrl", bundleArgs);
+ }
+
+ llvm::Function *IntrFn = llvm::Intrinsic::getOrInsertDeclaration(
+ B.GetInsertBlock()->getModule(), IntrinsicID, {Type});
+ llvm::Value *Value = B.CreateCall(IntrFn, Args, OB, VariableName);
return Value;
}
@@ -676,7 +684,16 @@ void CGHLSLRuntime::emitDXILUserSemanticStore(llvm::IRBuilder<> &B,
Source};
llvm::Intrinsic::ID IntrinsicID = llvm::Intrinsic::dx_store_output;
- B.CreateIntrinsic(/*ReturnType=*/CGM.VoidTy, IntrinsicID, Args, nullptr);
+
+ SmallVector<OperandBundleDef, 1> OB;
+ if (auto *Token = getConvergenceToken(*B.GetInsertBlock())) {
+ llvm::Value *bundleArgs[] = {Token};
+ OB.emplace_back("convergencectrl", bundleArgs);
+ }
+
+ llvm::Function *IntrFn = llvm::Intrinsic::getOrInsertDeclaration(
+ B.GetInsertBlock()->getModule(), IntrinsicID, {Source->getType()});
+ B.CreateCall(IntrFn, Args, OB);
}
llvm::Value *CGHLSLRuntime::emitUserSemanticLoad(
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index 0ff93d2ce7363..26c1e6ee00b57 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -5421,11 +5421,11 @@ class CodeGenFunction : public CodeGenTypeCache {
void maybeAttachRangeForLoad(llvm::LoadInst *Load, QualType Ty,
SourceLocation Loc);
-private:
// Emits a convergence_loop instruction for the given |BB|, with |ParentToken|
// as it's parent convergence instr.
llvm::ConvergenceControlInst *emitConvergenceLoopToken(llvm::BasicBlock *BB);
+private:
// Adds a convergence_ctrl token with |ParentToken| as parent convergence
// instr to the call |Input|.
llvm::CallBase *addConvergenceControlToken(llvm::CallBase *Input);
diff --git a/clang/lib/CodeGen/CodeGenModule.h b/clang/lib/CodeGen/CodeGenModule.h
index 0a697c84b66a7..bfd879f21d8b6 100644
--- a/clang/lib/CodeGen/CodeGenModule.h
+++ b/clang/lib/CodeGen/CodeGenModule.h
@@ -1811,7 +1811,7 @@ class CodeGenModule : public CodeGenTypeCache {
bool shouldEmitConvergenceTokens() const {
// TODO: this should probably become unconditional once the controlled
// convergence becomes the norm.
- return getTriple().isSPIRVLogical();
+ return getTriple().isSPIRVLogical() || getTriple().isDXIL();
}
void addUndefinedGlobalForTailCall(
diff --git a/clang/test/CodeGenDirectX/Builtins/dot2add.c b/clang/test/CodeGenDirectX/Builtins/dot2add.c
index 4275a285012b0..bc5073995522e 100644
--- a/clang/test/CodeGenDirectX/Builtins/dot2add.c
+++ b/clang/test/CodeGenDirectX/Builtins/dot2add.c
@@ -8,6 +8,7 @@ typedef half half2 __attribute__((ext_vector_type(2)));
// CHECK-LABEL: define float @test_dot2add(
// CHECK-SAME: <2 x half> noundef [[X:%.*]], <2 x half> noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0:[0-9]+]] {
// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
// CHECK-NEXT: [[X_ADDR:%.*]] = alloca <2 x half>, align 2
// CHECK-NEXT: [[Y_ADDR:%.*]] = alloca <2 x half>, align 2
// CHECK-NEXT: [[Z_ADDR:%.*]] = alloca float, align 4
diff --git a/clang/test/CodeGenHLSL/BasicFeatures/ArrayReturn.hlsl b/clang/test/CodeGenHLSL/BasicFeatures/ArrayReturn.hlsl
index 832c4ac9b10f5..b4235eed318e4 100644
--- a/clang/test/CodeGenHLSL/BasicFeatures/ArrayReturn.hlsl
+++ b/clang/test/CodeGenHLSL/BasicFeatures/ArrayReturn.hlsl
@@ -3,12 +3,14 @@
typedef int Foo[2];
// CHECK-LABEL: define void {{.*}}boop{{.*}}(ptr dead_on_unwind noalias writable sret([2 x i32]) align 4 %agg.result)
+// CHECK: %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
// CHECK: [[G:%.*]] = alloca [2 x i32], align 4
// CHECK-NEXT: call void @llvm.memcpy.p0.p0.i32(ptr align 4 [[G]], ptr align 4 {{.*}}, i32 8, i1 false)
// CHECK-NEXT: [[AIB:%.*]] = getelementptr inbounds [2 x i32], ptr %agg.result, i32 0, i32 0
// CHECK-NEXT: br label %arrayinit.body
// CHECK: arrayinit.body:
// CHECK-NEXT: [[AII:%.*]] = phi i32 [ 0, %entry ], [ %arrayinit.next, %arrayinit.body ]
+// CHECK-NEXT: %[[#CV_LOOP:]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %[[#C_ENTRY]]) ]
// CHECK-NEXT: [[X:%.*]] = getelementptr inbounds i32, ptr [[AIB]], i32 [[AII]]
// CHECK-NEXT: [[AI:%.*]] = getelementptr inbounds nuw [2 x i32], ptr [[G]], i32 0, i32 [[AII]]
// CHECK-NEXT: [[Y:%.*]] = load i32, ptr [[AI]], align 4
diff --git a/clang/test/CodeGenHLSL/BasicFeatures/InitLists.hlsl b/clang/test/CodeGenHLSL/BasicFeatures/InitLists.hlsl
index 3d7b8a906cdae..b34fd190a057c 100644
--- a/clang/test/CodeGenHLSL/BasicFeatures/InitLists.hlsl
+++ b/clang/test/CodeGenHLSL/BasicFeatures/InitLists.hlsl
@@ -66,6 +66,7 @@ struct UnnamedDerived : UnnamedOnly {};
// CHECK-LABEL: define hidden void @_Z5case1v(
// CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]]) #[[ATTR0:[0-9]+]] {
// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
// CHECK-NEXT: call void @llvm.memcpy.p0.p0.i32(ptr align 1 [[AGG_RESULT]], ptr align 1 @__const._Z5case1v.TF1, i32 8, i1 false)
// CHECK-NEXT: ret void
//
@@ -78,6 +79,7 @@ TwoFloats case1() {
// CHECK-LABEL: define hidden void @_Z5case2v(
// CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]]) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
// CHECK-NEXT: call void @llvm.memcpy.p0.p0.i32(ptr align 1 [[AGG_RESULT]], ptr align 1 @__const._Z5case2v.TF2, i32 8, i1 false)
// CHECK-NEXT: ret void
//
@@ -90,6 +92,7 @@ TwoFloats case2() {
// CHECK-LABEL: define hidden void @_Z5case3i(
// CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]], i32 noundef [[VAL:%.*]]) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
// CHECK-NEXT: [[VAL_ADDR:%.*]] = alloca i32, align 4
// CHECK-NEXT: store i32 [[VAL]], ptr [[VAL_ADDR]], align 4
// CHECK-NEXT: [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOFLOATS]], ptr [[AGG_RESULT]], i32 0, i32 0
@@ -110,6 +113,7 @@ TwoFloats case3(int Val) {
// CHECK-LABEL: define hidden void @_Z5case4Dv2_i(
// CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]], <2 x i32> noundef [[TWOVALS:%.*]]) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
// CHECK-NEXT: [[TWOVALS_ADDR:%.*]] = alloca <2 x i32>, align 4
// CHECK-NEXT: store <2 x i32> [[TWOVALS]], ptr [[TWOVALS_ADDR]], align 4
// CHECK-NEXT: [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOFLOATS]], ptr [[AGG_RESULT]], i32 0, i32 0
@@ -133,6 +137,7 @@ TwoFloats case4(int2 TwoVals) {
// CHECK-LABEL: define hidden void @_Z5case5Dv2_i(
// CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOINTS:%.*]]) align 1 [[AGG_RESULT:%.*]], <2 x i32> noundef [[TWOVALS:%.*]]) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
// CHECK-NEXT: [[TWOVALS_ADDR:%.*]] = alloca <2 x i32>, align 4
// CHECK-NEXT: store <2 x i32> [[TWOVALS]], ptr [[TWOVALS_ADDR]], align 4
// CHECK-NEXT: [[Z:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOINTS]], ptr [[AGG_RESULT]], i32 0, i32 0
@@ -155,6 +160,7 @@ TwoInts case5(int2 TwoVals) {
// CHECK-LABEL: define hidden void @_Z5case69TwoFloats(
// CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOINTS:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noundef byval([[STRUCT_TWOFLOATS:%.*]]) align 1 [[TF4:%.*]]) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
// CHECK-NEXT: [[Z:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOINTS]], ptr [[AGG_RESULT]], i32 0, i32 0
// CHECK-NEXT: [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOFLOATS]], ptr [[TF4]], i32 0, i32 0
// CHECK-NEXT: [[TMP0:%.*]] = load float, ptr [[X]], align 1
@@ -177,6 +183,7 @@ TwoInts case6(TwoFloats TF4) {
// CHECK-LABEL: define hidden void @_Z5case77TwoIntsS_i9TwoFloatsS0_S0_S0_(
// CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_DOGGO:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noundef byval([[STRUCT_TWOINTS:%.*]]) align 1 [[TI1:%.*]], ptr noundef byval([[STRUCT_TWOINTS]]) align 1 [[TI2:%.*]], i32 noundef [[VAL:%.*]], ptr noundef byval([[STRUCT_TWOFLOATS:%.*]]) align 1 [[TF1:%.*]], ptr noundef byval([[STRUCT_TWOFLOATS]]) align 1 [[TF2:%.*]], ptr noundef byval([[STRUCT_TWOFLOATS]]) align 1 [[TF3:%.*]], ptr noundef byval([[STRUCT_TWOFLOATS]]) align 1 [[TF4:%.*]]) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
// CHECK-NEXT: [[VAL_ADDR:%.*]] = alloca i32, align 4
// CHECK-NEXT: store i32 [[VAL]], ptr [[VAL_ADDR]], align 4
// CHECK-NEXT: [[LEGSTATE:%.*]] = getelementptr inbounds nuw [[STRUCT_DOGGO]], ptr [[AGG_RESULT]], i32 0, i32 0
@@ -241,6 +248,7 @@ Doggo case7(TwoInts TI1, TwoInts TI2, int Val, TwoFloats TF1, TwoFloats TF2,
// CHECK-LABEL: define hidden void @_Z5case85Doggo(
// CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_ANIMALBITS:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noundef byval([[STRUCT_DOGGO:%.*]]) align 1 [[D1:%.*]]) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
// CHECK-NEXT: [[LEGS:%.*]] = getelementptr inbounds nuw [[STRUCT_ANIMALBITS]], ptr [[AGG_RESULT]], i32 0, i32 0
// CHECK-NEXT: [[LEGSTATE:%.*]] = getelementptr inbounds nuw [[STRUCT_DOGGO]], ptr [[D1]], i32 0, i32 0
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, ptr [[LEGSTATE]], align 1
@@ -327,6 +335,7 @@ AnimalBits case8(Doggo D1) {
// CHECK-LABEL: define hidden void @_Z5case95Doggo10AnimalBits(
// CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_ZOO:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noundef byval([[STRUCT_DOGGO:%.*]]) align 1 [[D1:%.*]], ptr noundef byval([[STRUCT_ANIMALBITS:%.*]]) align 1 [[A1:%.*]]) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
// CHECK-NEXT: [[DOGS:%.*]] = getelementptr inbounds nuw [[STRUCT_ZOO]], ptr [[AGG_RESULT]], i32 0, i32 0
// CHECK-NEXT: [[LEGSTATE:%.*]] = getelementptr inbounds nuw [[STRUCT_DOGGO]], ptr [[DOGS]], i32 0, i32 0
// CHECK-NEXT: [[LEGSTATE1:%.*]] = getelementptr inbounds nuw [[STRUCT_DOGGO]], ptr [[D1]], i32 0, i32 0
@@ -743,6 +752,7 @@ Zoo case9(Doggo D1, AnimalBits A1) {
// CHECK-LABEL: define hidden void @_Z6case109TwoFloatsS_(
// CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_FOURFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noundef byval([[STRUCT_TWOFLOATS:%.*]]) align 1 [[TF1:%.*]], ptr noundef byval([[STRUCT_TWOFLOATS]]) align 1 [[TF2:%.*]]) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
// CHECK-NEXT: [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOFLOATS]], ptr [[AGG_RESULT]], i32 0, i32 0
// CHECK-NEXT: [[X1:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOFLOATS]], ptr [[TF1]], i32 0, i32 0
// CHECK-NEXT: [[TMP0:%.*]] = load float, ptr [[X1]], align 1
@@ -770,6 +780,7 @@ FourFloats case10(TwoFloats TF1, TwoFloats TF2) {
// CHECK-LABEL: define hidden void @_Z6case11f(
// CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_FOURFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]], float noundef nofpclass(nan inf) [[F:%.*]]) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
// CHECK-NEXT: [[F_ADDR:%.*]] = alloca float, align 4
// CHECK-NEXT: [[REF_TMP:%.*]] = alloca <4 x float>, align 4
// CHECK-NEXT: [[REF_TMP1:%.*]] = alloca <4 x float>, align 4
@@ -819,6 +830,7 @@ FourFloats case11(float F) {
// CHECK-LABEL: define hidden void @_Z6case12ii(
// CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_SLICYBITS:%.*]]) align 1 [[AGG_RESULT:%.*]], i32 noundef [[I:%.*]], i32 noundef [[J:%.*]]) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
// CHECK-NEXT: [[I_ADDR:%.*]] = alloca i32, align 4
// CHECK-NEXT: [[J_ADDR:%.*]] = alloca i32, align 4
// CHECK-NEXT: store i32 [[I]], ptr [[I_ADDR]], align 4
@@ -841,6 +853,7 @@ SlicyBits case12(int I, int J) {
// CHECK-LABEL: define hidden void @_Z6case137TwoInts(
// CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_SLICYBITS:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noundef byval([[STRUCT_TWOINTS:%.*]]) align 1 [[TI:%.*]]) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
// CHECK-NEXT: [[Z:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOINTS]], ptr [[TI]], i32 0, i32 0
// CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[Z]], align 1
// CHECK-NEXT: [[TMP1:%.*]] = trunc i32 [[TMP0]] to i8
@@ -861,6 +874,7 @@ SlicyBits case13(TwoInts TI) {
// CHECK-LABEL: define hidden void @_Z6case149SlicyBits(
// CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOINTS:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noundef byval([[STRUCT_SLICYBITS:%.*]]) align 1 [[SB:%.*]]) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
// CHECK-NEXT: [[Z:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOINTS]], ptr [[AGG_RESULT]], i32 0, i32 0
// CHECK-NEXT: [[BF_LOAD:%.*]] = load i8, ptr [[SB]], align 1
// CHECK-NEXT: [[BF_CAST:%.*]] = sext i8 [[BF_LOAD]] to i32
@@ -881,6 +895,7 @@ TwoInts case14(SlicyBits SB) {
// CHECK-LABEL: define hidden void @_Z6case159SlicyBits(
// CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noundef byval([[STRUCT_SLICYBITS:%.*]]) align 1 [[SB:%.*]]) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
// CHECK-NEXT: [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOFLOATS]], ptr [[AGG_RESULT]], i32 0, i32 0
// CHECK-NEXT: [[BF_LOAD:%.*]] = load i8, ptr [[SB]], align 1
// CHECK-NEXT: [[BF_CAST:%.*]] = sext i8 [[BF_LOAD]] to i32
@@ -904,6 +919,7 @@ TwoFloats case15(SlicyBits SB) {
// CHECK-LABEL: define hidden void @_Z7makeTwoRf(
// CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_TWOFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]], ptr noalias noundef nonnull align 4 dereferenceable(4) [[X:%.*]]) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
// CHECK-NEXT: [[X_ADDR:%.*]] = alloca ptr, align 4
// CHECK-NEXT: store ptr [[X]], ptr [[X_ADDR]], align 4
// CHECK-NEXT: [[X1:%.*]] = getelementptr inbounds nuw [[STRUCT_TWOFLOATS]], ptr [[AGG_RESULT]], i32 0, i32 0
@@ -930,6 +946,7 @@ TwoFloats makeTwo(inout float X) {
// CHECK-LABEL: define hidden void @_Z6case16v(
// CHECK-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_FOURFLOATS:%.*]]) align 1 [[AGG_RESULT:%.*]]) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
// CHECK-NEXT: [[X:%.*]] = alloca float, align 4
// CHECK-NEXT: [[REF_TMP:%.*]] = alloca [[STRUCT_TWOFLOATS:%.*]], align 1
// CHECK-NEXT: [[TMP:%.*]] = alloca float, align 4
@@ -963,6 +980,7 @@ FourFloats case16() {
// CHECK-LABEL: define hidden noundef i32 @_Z12case17Helperi(
// CHECK-SAME: i32 noundef [[X:%.*]]) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
// CHECK-NEXT: [[X_ADDR:%.*]] = alloca i32, align 4
// CHECK-NEXT: store i32 [[X]], ptr [[X_ADDR]], align 4
// CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[X_ADDR]], align 4
@@ -976,6 +994,7 @@ int case17Helper(int x) {
// CHECK-LABEL: define hidden void @_Z6case17v(
// CHECK-SAME: ) #[[ATTR0]] {
// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: %[[#C_ENTRY:]] = call token @llvm.experimental.convergence.entry()
// CHECK-NEXT: [[X:%.*]] = alloca <2 x i32>, align 4
// CHECK-NEXT: [[CALL:%.*]] = call noundef i32 @_Z12case17Helperi(i32 noundef 0) #[[ATTR2]]
// CHECK-NEXT: [[CALL1:%.*]] = call noundef i32 @_Z12case17Helperi(i32 nou...
[truncated]
|
bob80905
left a comment
There was a problem hiding this comment.
Looks good to my untrained eye, would recommend updating the new test descriptions and maybe even being a bit verbose with them.
The `SPIRVStripConvergenceIntrinsic` pass was written as a spirv pass as it is the currently the only target that emits convergence tokens during codegen. There is nothing target specific to the pass, and, we plan to emit convergence tokens when targeting DirectX (and all targets in general), so move the pass to a common place. The previous pass used temporary `Undef`s, as part of moving the pass we can simply reverse the traverse order to remove the use of `Undef` as it is deprecated. Enables the pass for targeting DirectX and is a pre-req for: #188792. Assisted by: Github Copilot
|
@Keenuts I will merge this tmrw after I confirm the hlsl test suite is passing as expected. Let me know if I should hold until you get a chance to review. Thanks |
…ls (#188537) The `SPIRVStripConvergenceIntrinsic` pass was written as a spirv pass as it is the currently the only target that emits convergence tokens during codegen. There is nothing target specific to the pass, and, we plan to emit convergence tokens when targeting DirectX (and all targets in general), so move the pass to a common place. The previous pass used temporary `Undef`s, as part of moving the pass we can simply reverse the traverse order to remove the use of `Undef` as it is deprecated. Enables the pass for targeting DirectX and is a pre-req for: llvm/llvm-project#188792. Assisted by: Github Copilot
🪟 Windows x64 Test Results
✅ The build succeeded and all tests passed. |
…ls (#188537) The `SPIRVStripConvergenceIntrinsic` pass was written as a spirv pass as it is the currently the only target that emits convergence tokens during codegen. There is nothing target specific to the pass, and, we plan to emit convergence tokens when targeting DirectX (and all targets in general), so move the pass to a common place. The previous pass used temporary `Undef`s, as part of moving the pass we can simply reverse the traverse order to remove the use of `Undef` as it is deprecated. Enables the pass for targeting DirectX and is a pre-req for: llvm/llvm-project#188792. Assisted by: Github Copilot
These changes were introduced between pr open and now. Fixing them in the same manner as before
hekota
left a comment
There was a problem hiding this comment.
Just a few nits in the test changes.
| Texture2D<float4> t; | ||
|
|
||
| // CHECK: define internal {{.*}} <4 x float> @test_mips(float vector[2])(<2 x float> {{.*}} %loc) #1 { | ||
| // CHECK: define internal {{.*}} <4 x float> @test_mips(float vector[2])(<2 x float> {{.*}} %loc) #2 { |
There was a problem hiding this comment.
| // CHECK: define internal {{.*}} <4 x float> @test_mips(float vector[2])(<2 x float> {{.*}} %loc) #2 { | |
| // CHECK: define internal {{.*}} <4 x float> @test_mips(float vector[2])(<2 x float> {{.*}} %loc) |
| // CHECK: %hlsl.f32tof16 = call i32 @llvm.dx.legacyf32tof16.f32(float %[[#]]) | ||
| // CHECK: ret i32 %hlsl.f32tof16 | ||
| // CHECK: declare i32 @llvm.dx.legacyf32tof16.f32(float) #1 | ||
| // CHECK: declare i32 @llvm.dx.legacyf32tof16.f32(float) #2 |
There was a problem hiding this comment.
| // CHECK: declare i32 @llvm.dx.legacyf32tof16.f32(float) #2 | |
| // CHECK: declare i32 @llvm.dx.legacyf32tof16.f32(float) |
The function attributes are not important and can be removed. This applies to all of the other files you needed to update that have the same pattern.
| // CHECK: %call1 = call noundef i32 @_ZN4Pair8getFirstEv(ptr noundef nonnull align 1 dereferenceable(8) %Vals) #{{[0-9]+}} [ "convergencectrl"(token %[[#C_ENTRY]]) ] | ||
| // CHECK-NEXT: %First = getelementptr inbounds nuw %struct.Pair, ptr %Vals, i32 0, i32 0 | ||
| // CHECK-NEXT: store i32 %call, ptr %First, align 1 | ||
| // CHECK-NEXT: %call1 = call reassoc nnan ninf nsz arcp afn noundef nofpclass(nan inf) float @_ZN4Pair9getSecondEv(ptr noundef nonnull align 1 dereferenceable(8) %Vals) | ||
| // CHECK-NEXT: store i32 %call1, ptr %First, align 1 | ||
| // CHECK-NEXT: %call2 = call reassoc nnan ninf nsz arcp afn noundef nofpclass(nan inf) float @_ZN4Pair9getSecondEv(ptr noundef nonnull align 1 dereferenceable(8) %Vals) #{{[0-9]+}} [ "convergencectrl"(token %[[#C_ENTRY]]) ] |
There was a problem hiding this comment.
This could use a regex for call1 and call2.
…8537) The `SPIRVStripConvergenceIntrinsic` pass was written as a spirv pass as it is the currently the only target that emits convergence tokens during codegen. There is nothing target specific to the pass, and, we plan to emit convergence tokens when targeting DirectX (and all targets in general), so move the pass to a common place. The previous pass used temporary `Undef`s, as part of moving the pass we can simply reverse the traverse order to remove the use of `Undef` as it is deprecated. Enables the pass for targeting DirectX and is a pre-req for: llvm#188792. Assisted by: Github Copilot
…g DirectX" (#193090) This change appears to introduce complications when trying to do a full loop unroll that is exhibited here: https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618. This results in invalid DXIL as the unreachable branch is not correctly cleaned up. Initial leads look like this is because the instructions with convergence control tokens are still being used for analysis when they are within an unreachable branch. Reverts #188792
…en targeting DirectX" (#193090) This change appears to introduce complications when trying to do a full loop unroll that is exhibited here: https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618. This results in invalid DXIL as the unreachable branch is not correctly cleaned up. Initial leads look like this is because the instructions with convergence control tokens are still being used for analysis when they are within an unreachable branch. Reverts llvm/llvm-project#188792
…en targeting DirectX" (#193090) This change appears to introduce complications when trying to do a full loop unroll that is exhibited here: https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618. This results in invalid DXIL as the unreachable branch is not correctly cleaned up. Initial leads look like this is because the instructions with convergence control tokens are still being used for analysis when they are within an unreachable branch. Reverts llvm/llvm-project#188792
llvm#188792) This pr allows codegen to generate convergence control tokens. This allows for a more accurate description of convergence behaviour to prevent (or allow) invalid control flow graph transforms. As noted, the use of convergence control tokens is the ideal norm and this follows that by enabling it for `DirectX`. This was done now under the precedent of preventing a convergent exit condition of a loop from being illegally moved across control flow. Test cases for this are explicitly added. Please see the individual commits for logically similar chunks. Unfortunately, it is tricky to stage this in smaller individual commits. Resolves llvm#180621. llvm#188537 is a pre-requisite of this passing HLSL offload suite tests. Assisted by: Github Copilot
…rge attempt) (#193584) Any expression that accesses a resource or resource array member of a global struct instance must be during codegen replaced by an access of the corresponding implicit global resource variable. When codegen encounters a `MemberExpr` of a resource type, it traverses the AST to locate the parent struct declaration, building the expected global resource variable name along the way. If the parent declaration is a non-static global struct instance, codegen searches its `HLSLAssociatedResourceDeclAttr` attributes to locate the matching global resource variable and then generates IR code to access the resource global in place of the member access. Fixes #182989 This is the second try to land this. The [first one](#187127 with #188792 and both PRs had to be reverted. No updates needed to this change. I synced with @inbelic and we agreed that this one should go in first.
…ls (#188537) The `SPIRVStripConvergenceIntrinsic` pass was written as a spirv pass as it is the currently the only target that emits convergence tokens during codegen. There is nothing target specific to the pass, and, we plan to emit convergence tokens when targeting DirectX (and all targets in general), so move the pass to a common place. The previous pass used temporary `Undef`s, as part of moving the pass we can simply reverse the traverse order to remove the use of `Undef` as it is deprecated. Enables the pass for targeting DirectX and is a pre-req for: llvm/llvm-project#188792. Assisted by: Github Copilot
…en targeting DirectX" (#193090) This change appears to introduce complications when trying to do a full loop unroll that is exhibited here: https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618. This results in invalid DXIL as the unreachable branch is not correctly cleaned up. Initial leads look like this is because the instructions with convergence control tokens are still being used for analysis when they are within an unreachable branch. Reverts llvm/llvm-project#188792
…ly` and `IntrReadMem` (#193593) `IntrConvergent` was originally added to `dx.resource.getpointer` to prevent optimization passes (`SimplifyCFG`, `GVN`) from sinking the intrinsic out of control flow branches, which would create phi nodes on the returned pointer. Using `IntrInaccessibleMemOnly` and `IntrReadMem` semantics still prevent passes from merging or sinking identical calls across branches. However, this allows the call to be moved within a single control flow path. Updates relevant tests and adds a new test to demonstrate a now legal potential optimization. This was discovered when #188792 caused the following failure: https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618. When emitting convergence control tokens, each resource access is then a user of the convergence control tokens, which makes it's use more unnecessarily restrictive for optimizations and in this case would prevent a loop unroll from taking place. Assisted by: Claude Opus 4.6
…ssibleMemOnly` and `IntrReadMem` (#193593) `IntrConvergent` was originally added to `dx.resource.getpointer` to prevent optimization passes (`SimplifyCFG`, `GVN`) from sinking the intrinsic out of control flow branches, which would create phi nodes on the returned pointer. Using `IntrInaccessibleMemOnly` and `IntrReadMem` semantics still prevent passes from merging or sinking identical calls across branches. However, this allows the call to be moved within a single control flow path. Updates relevant tests and adds a new test to demonstrate a now legal potential optimization. This was discovered when llvm/llvm-project#188792 caused the following failure: https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618. When emitting convergence control tokens, each resource access is then a user of the convergence control tokens, which makes it's use more unnecessarily restrictive for optimizations and in this case would prevent a loop unroll from taking place. Assisted by: Claude Opus 4.6
…ssibleMemOnly` and `IntrReadMem` (#193593) `IntrConvergent` was originally added to `dx.resource.getpointer` to prevent optimization passes (`SimplifyCFG`, `GVN`) from sinking the intrinsic out of control flow branches, which would create phi nodes on the returned pointer. Using `IntrInaccessibleMemOnly` and `IntrReadMem` semantics still prevent passes from merging or sinking identical calls across branches. However, this allows the call to be moved within a single control flow path. Updates relevant tests and adds a new test to demonstrate a now legal potential optimization. This was discovered when llvm/llvm-project#188792 caused the following failure: https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618. When emitting convergence control tokens, each resource access is then a user of the convergence control tokens, which makes it's use more unnecessarily restrictive for optimizations and in this case would prevent a loop unroll from taking place. Assisted by: Claude Opus 4.6
…ssibleMemOnly` and `IntrReadMem` (#193593) `IntrConvergent` was originally added to `dx.resource.getpointer` to prevent optimization passes (`SimplifyCFG`, `GVN`) from sinking the intrinsic out of control flow branches, which would create phi nodes on the returned pointer. Using `IntrInaccessibleMemOnly` and `IntrReadMem` semantics still prevent passes from merging or sinking identical calls across branches. However, this allows the call to be moved within a single control flow path. Updates relevant tests and adds a new test to demonstrate a now legal potential optimization. This was discovered when llvm/llvm-project#188792 caused the following failure: https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618. When emitting convergence control tokens, each resource access is then a user of the convergence control tokens, which makes it's use more unnecessarily restrictive for optimizations and in this case would prevent a loop unroll from taking place. Assisted by: Claude Opus 4.6
…rge attempt) (llvm#193584) Any expression that accesses a resource or resource array member of a global struct instance must be during codegen replaced by an access of the corresponding implicit global resource variable. When codegen encounters a `MemberExpr` of a resource type, it traverses the AST to locate the parent struct declaration, building the expected global resource variable name along the way. If the parent declaration is a non-static global struct instance, codegen searches its `HLSLAssociatedResourceDeclAttr` attributes to locate the matching global resource variable and then generates IR code to access the resource global in place of the member access. Fixes llvm#182989 This is the second try to land this. The [first one](llvm#187127 with llvm#188792 and both PRs had to be reverted. No updates needed to this change. I synced with @inbelic and we agreed that this one should go in first.
…ly` and `IntrReadMem` (llvm#193593) `IntrConvergent` was originally added to `dx.resource.getpointer` to prevent optimization passes (`SimplifyCFG`, `GVN`) from sinking the intrinsic out of control flow branches, which would create phi nodes on the returned pointer. Using `IntrInaccessibleMemOnly` and `IntrReadMem` semantics still prevent passes from merging or sinking identical calls across branches. However, this allows the call to be moved within a single control flow path. Updates relevant tests and adds a new test to demonstrate a now legal potential optimization. This was discovered when llvm#188792 caused the following failure: https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618. When emitting convergence control tokens, each resource access is then a user of the convergence control tokens, which makes it's use more unnecessarily restrictive for optimizations and in this case would prevent a loop unroll from taking place. Assisted by: Claude Opus 4.6
llvm#188792) This pr allows codegen to generate convergence control tokens. This allows for a more accurate description of convergence behaviour to prevent (or allow) invalid control flow graph transforms. As noted, the use of convergence control tokens is the ideal norm and this follows that by enabling it for `DirectX`. This was done now under the precedent of preventing a convergent exit condition of a loop from being illegally moved across control flow. Test cases for this are explicitly added. Please see the individual commits for logically similar chunks. Unfortunately, it is tricky to stage this in smaller individual commits. Resolves llvm#180621. llvm#188537 is a pre-requisite of this passing HLSL offload suite tests. Assisted by: Github Copilot
…g DirectX" (llvm#193090) This change appears to introduce complications when trying to do a full loop unroll that is exhibited here: https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618. This results in invalid DXIL as the unreachable branch is not correctly cleaned up. Initial leads look like this is because the instructions with convergence control tokens are still being used for analysis when they are within an unreachable branch. Reverts llvm#188792
…rge attempt) (llvm#193584) Any expression that accesses a resource or resource array member of a global struct instance must be during codegen replaced by an access of the corresponding implicit global resource variable. When codegen encounters a `MemberExpr` of a resource type, it traverses the AST to locate the parent struct declaration, building the expected global resource variable name along the way. If the parent declaration is a non-static global struct instance, codegen searches its `HLSLAssociatedResourceDeclAttr` attributes to locate the matching global resource variable and then generates IR code to access the resource global in place of the member access. Fixes llvm#182989 This is the second try to land this. The [first one](llvm#187127 with llvm#188792 and both PRs had to be reverted. No updates needed to this change. I synced with @inbelic and we agreed that this one should go in first.
…ly` and `IntrReadMem` (llvm#193593) `IntrConvergent` was originally added to `dx.resource.getpointer` to prevent optimization passes (`SimplifyCFG`, `GVN`) from sinking the intrinsic out of control flow branches, which would create phi nodes on the returned pointer. Using `IntrInaccessibleMemOnly` and `IntrReadMem` semantics still prevent passes from merging or sinking identical calls across branches. However, this allows the call to be moved within a single control flow path. Updates relevant tests and adds a new test to demonstrate a now legal potential optimization. This was discovered when llvm#188792 caused the following failure: https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618. When emitting convergence control tokens, each resource access is then a user of the convergence control tokens, which makes it's use more unnecessarily restrictive for optimizations and in this case would prevent a loop unroll from taking place. Assisted by: Claude Opus 4.6
…g DirectX" (#194452) The initial landing surfaced 3 somewhat orthogonal issues related to loop unrolling. These are addressed: [here](#193592), [here](#193593) and [here](#193590). These caused these [tests](https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618#step:8:87913) to fail in the offload test suite. We can verify that these are now passing as expected (fixing any of the 3 issues would resolve this and allow us to reland) Some additional tests were added since the revert that are now accounted for and updated in the reland fixes commit. This relands #188792
…en targeting DirectX" (#194452) The initial landing surfaced 3 somewhat orthogonal issues related to loop unrolling. These are addressed: [here](llvm/llvm-project#193592), [here](llvm/llvm-project#193593) and [here](llvm/llvm-project#193590). These caused these [tests](https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618#step:8:87913) to fail in the offload test suite. We can verify that these are now passing as expected (fixing any of the 3 issues would resolve this and allow us to reland) Some additional tests were added since the revert that are now accounted for and updated in the reland fixes commit. This relands llvm/llvm-project#188792
…en targeting DirectX" (#194452) The initial landing surfaced 3 somewhat orthogonal issues related to loop unrolling. These are addressed: [here](llvm/llvm-project#193592), [here](llvm/llvm-project#193593) and [here](llvm/llvm-project#193590). These caused these [tests](https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618#step:8:87913) to fail in the offload test suite. We can verify that these are now passing as expected (fixing any of the 3 issues would resolve this and allow us to reland) Some additional tests were added since the revert that are now accounted for and updated in the reland fixes commit. This relands llvm/llvm-project#188792
This pr allows codegen to generate convergence control tokens. This allows for a more accurate description of convergence behaviour to prevent (or allow) invalid control flow graph transforms. As noted, the use of convergence control tokens is the ideal norm and this follows that by enabling it for
DirectX.This was done now under the precedent of preventing a convergent exit condition of a loop from being illegally moved across control flow. Test cases for this are explicitly added.
Please see the individual commits for logically similar chunks. Unfortunately, it is tricky to stage this in smaller individual commits.
Resolves #180621.
#188537 is a pre-requisite of this passing HLSL offload suite tests.
Assisted by: Github Copilot