Consider the following:
RWStructuredBuffer<uint> Out : register(u0);
[numthreads(8,1,1)]
void main(uint3 TID : SV_GroupThreadID) {
for (uint i = 0; i < 8; i++) {
if (i == TID.x) {
Out[TID.x] = WaveActiveMax(TID.x);
break;
}
}
}
Clang currently optimizes away the loop through a series of passes (LoopRotatePass -> IndVarSimplifyPass -> SimpleLoopUnswitchPass -> ...) such that it becomes equivalent to the following:
RWStructuredBuffer<uint> Out : register(u0);
[numthreads(8,1,1)]
void main(uint3 TID : SV_GroupThreadID) {
if (TID.x < 8) {
Out[TID.x] = WaveActiveMax(TID.x);
break;
}
}
The expected behaviour is that only a single lane in the wave is active through each iteration of the for loop and each invocation of the convergent op. This means we should not allow for this optimization to remove the loop in the convergent case as it means there are now 8 active lanes at the point of invocation which is not equivalent.
The spirv code generation path appears to account for this with the use of convergencectrl attributes on the operations. This is demonstrated here: https://godbolt.org/z/s53W8coWv, where we see that the LoopRotatePass cannot modify the control flow because of the convergencectrl attributes.
AC:
- Update HLSL codegen to emit
convergencectrl attributes
- Verify if we are required to remove these, or if they are just ignored, at a later stage (maybe
dxil-op-lower)
- Add relevant testing during code gen and that the final generated DXIL does not emit unknown instructions/attrs
Consider the following:
Clang currently optimizes away the loop through a series of passes (
LoopRotatePass->IndVarSimplifyPass->SimpleLoopUnswitchPass-> ...) such that it becomes equivalent to the following:The expected behaviour is that only a single lane in the wave is active through each iteration of the for loop and each invocation of the convergent op. This means we should not allow for this optimization to remove the loop in the convergent case as it means there are now 8 active lanes at the point of invocation which is not equivalent.
The spirv code generation path appears to account for this with the use of
convergencectrlattributes on the operations. This is demonstrated here: https://godbolt.org/z/s53W8coWv, where we see that theLoopRotatePasscannot modify the control flow because of theconvergencectrlattributes.AC:
convergencectrlattributesdxil-op-lower)