Skip to content

Compute pass recording with a large number of (read/write) resources is very slow due to barrier emitting between dispatch calls #5766

@Wumpf

Description

@Wumpf

This is in particular a problem for bindless workflows. Compute passes typically have to emit a lot of barriers between dispatch calls in order to make sure that reads/writes from one dispatch don't affect the next if the same resource may be used.

Timings for a issuing 1000 dispatch with 6000 resources bound once:

Computepass: Bindless/1000 dispatch
                        time:   [139.48 ms 140.17 ms 141.25 ms]
                        thrpt:  [7.0795 Kelem/s 7.1343 Kelem/s 7.1692 Kelem/s]
Found 13 outliers among 100 measurements (13.00%)
  5 (5.00%) high mild
  8 (8.00%) high severe

For comparison on the same machine issuing 10x dispatches, each using 6 resources, binding before each dispatch:

Computepass: Single Threaded/1 computepasses x 10000 dispatches (Computepass Time)
                        time:   [19.353 ms 19.565 ms 19.792 ms]
                        thrpt:  [505.25 Kelem/s 511.11 Kelem/s 516.72 Kelem/s]

The bindless version is necessarily slower since it has to emit a lot more barriers speculatively and there's no way around it really. But it would be surprising if we couldn't do a lot better.

On the same machine the corresponding resnderpass test runs with more than 10x the resources & draw calls 10x faster (resulting in a 100x throughput of draw calls):

Renderpass: Bindless/10000 draws
                        time:   [11.720 ms 11.820 ms 11.921 ms]
                        thrpt:  [838.88 Kelem/s 846.04 Kelem/s 853.28 Kelem/s]

(there's only write-only resources involved here so the comparision isn't quite accurate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions