Compute pass benchmark by Wumpf · Pull Request #5767 · gfx-rs/wgpu

Wumpf · 2024-06-02T09:03:00Z

Connections

Follow-up to Add Benchmarks #5694
Lots of bound resources make compute passes very slow Compute pass recording with a large number of (read/write) resources is very slow due to barrier emitting between dispatch calls #5766

Description
Adds a benchmark for compute pass recording, very similar to what we have for render passes.

The prime motivation for this was to figure out whether the extensive changes I made to compute pass recording made performance worse or better - in fact there are good reasons for either. The short answer: It improved by 4-10% pass time since before I started!! 🥳
Even better, including submit time the improvements are 10-30%, but this is very likely not associated with the compute pass recording refactors :)

Unfortunately those changes landed over a quite long period of time so unless someone bisects this carefully we won't know what caused it exactly. It could be that the "fully consume the pass" change caused these improvements (we now make use of the fact that a pass can't be submitted twice) but then again this is probably a wash since before compute pass lifetimes refactor work started, compute pass was a very simple data structure (now it has extensive resource ownership). So it's just as likely that something else caused this.
For this comparision, I backported the benchmarks to c1291bd. to check it out yourself use the before-computepass-work-with-benches branch on my fork.

Raw results comparing c1291bd1312a77be73954856d0e7728877232033 against this branch:

Computepass: Single Threaded/1 computepasses x 10000 dispatches (Computepass Time)
                        time:   [18.441 ms 18.719 ms 19.010 ms]
                        thrpt:  [526.03 Kelem/s 534.23 Kelem/s 542.28 Kelem/s]
                 change:
                        time:   [-6.1982% -4.3270% -2.4471%] (p = 0.00 < 0.05)
                        thrpt:  [+2.5085% +4.5227% +6.6077%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
Computepass: Single Threaded/2 computepasses x 5000 dispatches (Computepass Time)
                        time:   [18.392 ms 18.560 ms 18.735 ms]
                        thrpt:  [533.77 Kelem/s 538.80 Kelem/s 543.73 Kelem/s]
                 change:
                        time:   [-8.6884% -7.5122% -6.2705%] (p = 0.00 < 0.05)
                        thrpt:  [+6.6900% +8.1224% +9.5151%]
                        Performance has improved.
Computepass: Single Threaded/4 computepasses x 2500 dispatches (Computepass Time)
                        time:   [19.154 ms 19.341 ms 19.535 ms]
                        thrpt:  [511.89 Kelem/s 517.04 Kelem/s 522.08 Kelem/s]
                 change:
                        time:   [-13.050% -11.257% -9.5528%] (p = 0.00 < 0.05)
                        thrpt:  [+10.562% +12.685% +15.008%]
                        Performance has improved.
Computepass: Single Threaded/8 computepasses x 1250 dispatches (Computepass Time)
                        time:   [20.198 ms 20.400 ms 20.610 ms]
                        thrpt:  [485.20 Kelem/s 490.21 Kelem/s 495.10 Kelem/s]
                 change:
                        time:   [-10.854% -9.1939% -7.4321%] (p = 0.00 < 0.05)
                        thrpt:  [+8.0288% +10.125% +12.176%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Computepass: Single Threaded/1 computepasses x 10000 dispatches (Submit Time)
                        time:   [10.087 ms 10.181 ms 10.281 ms]
                        thrpt:  [972.70 Kelem/s 982.18 Kelem/s 991.37 Kelem/s]
                 change:
                        time:   [-35.718% -34.659% -33.555%] (p = 0.00 < 0.05)
                        thrpt:  [+50.501% +53.043% +55.564%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Computepass: Single Threaded/2 computepasses x 5000 dispatches (Submit Time)
                        time:   [11.028 ms 11.129 ms 11.234 ms]
                        thrpt:  [890.17 Kelem/s 898.55 Kelem/s 906.79 Kelem/s]
                 change:
                        time:   [-32.267% -31.091% -29.847%] (p = 0.00 < 0.05)
                        thrpt:  [+42.546% +45.120% +47.638%]
                        Performance has improved.
Computepass: Single Threaded/4 computepasses x 2500 dispatches (Submit Time)
                        time:   [12.368 ms 12.456 ms 12.545 ms]
                        thrpt:  [797.11 Kelem/s 802.85 Kelem/s 808.52 Kelem/s]
                 change:
                        time:   [-28.125% -27.134% -26.125%] (p = 0.00 < 0.05)
                        thrpt:  [+35.363% +37.239% +39.131%]
                        Performance has improved.
Computepass: Single Threaded/8 computepasses x 1250 dispatches (Submit Time)
                        time:   [13.707 ms 13.818 ms 13.936 ms]
                        thrpt:  [717.56 Kelem/s 723.68 Kelem/s 729.57 Kelem/s]
                 change:
                        time:   [-24.102% -23.164% -22.189%] (p = 0.00 < 0.05)
                        thrpt:  [+28.516% +30.147% +31.756%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Computepass: Multi Threaded/2 threads x 5000 dispatch
                        time:   [9.8718 ms 9.9380 ms 10.016 ms]
                        thrpt:  [998.43 Kelem/s 1.0062 Melem/s 1.0130 Melem/s]
                 change:
                        time:   [-9.8552% -8.8156% -7.7884%] (p = 0.00 < 0.05)
                        thrpt:  [+8.4462% +9.6678% +10.933%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe
Computepass: Multi Threaded/4 threads x 2500 dispatch
                        time:   [5.7890 ms 5.8287 ms 5.8719 ms]
                        thrpt:  [1.7030 Melem/s 1.7157 Melem/s 1.7274 Melem/s]
                 change:
                        time:   [-14.697% -13.393% -12.090%] (p = 0.00 < 0.05)
                        thrpt:  [+13.753% +15.464% +17.229%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
Computepass: Multi Threaded/8 threads x 1250 dispatch
                        time:   [4.1858 ms 4.2230 ms 4.2613 ms]
                        thrpt:  [2.3467 Melem/s 2.3680 Melem/s 2.3890 Melem/s]
                 change:
                        time:   [-31.207% -29.893% -28.594%] (p = 0.00 < 0.05)
                        thrpt:  [+40.045% +42.640% +45.364%]
                        Performance has improved.

Computepass: Bindless/1000 dispatch
                        time:   [146.86 ms 147.21 ms 147.61 ms]
                        thrpt:  [6.7748 Kelem/s 6.7930 Kelem/s 6.8094 Kelem/s]
                 change:
                        time:   [+0.6461% +1.8619% +2.7813%] (p = 0.00 < 0.05)
                        thrpt:  [-2.7060% -1.8279% -0.6419%]
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

Computepass: Empty Submit with 60000 Resources
                        time:   [481.52 µs 484.35 µs 487.44 µs]
                        change: [-80.991% -79.937% -78.934%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high seve

Testing
it is a test!

Checklist

Run cargo fmt.
Run cargo clippy. If applicable, add:
- --target wasm32-unknown-unknown
- --target wasm32-unknown-emscripten
Run cargo xtask test to run tests.
Add change to CHANGELOG.md. See simple instructions inside file.

Wumpf · 2024-06-30T12:16:16Z

Despite some mitigations, Linux is failing this benchmark ~~spuriously~~.
Need to look into that before merging even if it shows up green on pending re-run (mostly curious if the same thing fails always)

Wumpf requested a review from a team as a code owner June 2, 2024 09:03

Wumpf mentioned this pull request Jun 3, 2024

Lifetimes on RenderPass make it difficult to use. #1453

Closed

nical approved these changes Jun 25, 2024

View reviewed changes

Wumpf added 6 commits June 29, 2024 14:58

compute pass benchmarks, without bindless

77e17a0

bindless benchmark for compute

8bf5fcb

less bindless dispatches because it's too slow

9800f14

changelog entry

86cee21

fix typos

2e77bbc

ignore apple paravirtual device just like on renderpass benchmark

7409c77

Wumpf force-pushed the compute-pass-benchmark branch from b764128 to 012ba30 Compare June 29, 2024 12:59

Use r32f for storage for better compatibility

ce1960b

Wumpf force-pushed the compute-pass-benchmark branch from 012ba30 to ce1960b Compare June 29, 2024 13:02

fix cargo benches compilation issue

a40d268

Wumpf force-pushed the compute-pass-benchmark branch from 888b3e9 to 58ae38e Compare June 30, 2024 11:24

reduce number of draws & computes when only running tests

4548b57

Wumpf force-pushed the compute-pass-benchmark branch from 58ae38e to 4548b57 Compare June 30, 2024 11:40

Wumpf added 2 commits July 7, 2024 11:48

Merge remote-tracking branch 'origin/trunk' into compute-pass-benchmark

01b8d96

skip compute benchmark on llvmpipe ci for now

1a5a5df

Wumpf force-pushed the compute-pass-benchmark branch from a6f4fd5 to 1a5a5df Compare July 7, 2024 10:40

Merge branch 'trunk' into compute-pass-benchmark

34dbeb3

Wumpf merged commit d3edbc5 into gfx-rs:trunk Jul 14, 2024

Wumpf deleted the compute-pass-benchmark branch July 14, 2024 20:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute pass benchmark#5767

Compute pass benchmark#5767
Wumpf merged 12 commits intogfx-rs:trunkfrom
Wumpf:compute-pass-benchmark

Wumpf commented Jun 2, 2024 •

edited

Loading

Uh oh!

Wumpf commented Jun 30, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Wumpf commented Jun 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Wumpf commented Jun 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Wumpf commented Jun 2, 2024 •

edited

Loading

Wumpf commented Jun 30, 2024 •

edited

Loading