Skip to content

[HLSL][DirectX] Loop is removed when exit condition is convergent #180621

@inbelic

Description

@inbelic

Consider the following:

RWStructuredBuffer<uint> Out : register(u0);

[numthreads(8,1,1)]
void main(uint3 TID : SV_GroupThreadID) {
    for (uint i = 0; i < 8; i++) {
        if (i == TID.x) {
            Out[TID.x] = WaveActiveMax(TID.x);
            break;
        }
    }
}

Clang currently optimizes away the loop through a series of passes (LoopRotatePass -> IndVarSimplifyPass -> SimpleLoopUnswitchPass -> ...) such that it becomes equivalent to the following:

RWStructuredBuffer<uint> Out : register(u0);

[numthreads(8,1,1)]
void main(uint3 TID : SV_GroupThreadID) {
    if (TID.x < 8) {
        Out[TID.x] = WaveActiveMax(TID.x);
        break;
    }
}

The expected behaviour is that only a single lane in the wave is active through each iteration of the for loop and each invocation of the convergent op. This means we should not allow for this optimization to remove the loop in the convergent case as it means there are now 8 active lanes at the point of invocation which is not equivalent.

The spirv code generation path appears to account for this with the use of convergencectrl attributes on the operations. This is demonstrated here: https://godbolt.org/z/s53W8coWv, where we see that the LoopRotatePass cannot modify the control flow because of the convergencectrl attributes.

AC:

  • Update HLSL codegen to emit convergencectrl attributes
  • Verify if we are required to remove these, or if they are just ignored, at a later stage (maybe dxil-op-lower)
  • Add relevant testing during code gen and that the final generated DXIL does not emit unknown instructions/attrs

Metadata

Metadata

Assignees

Labels

Type

No fields configured for Bug.

Projects

Status
Closed

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions