Skip to content

vulkan: add pipeline barriers for memcpy read operations#23770

Merged
0cc4m merged 2 commits into
masterfrom
0cc4m/vulkan-host-pipeline-barriers
Jun 12, 2026
Merged

vulkan: add pipeline barriers for memcpy read operations#23770
0cc4m merged 2 commits into
masterfrom
0cc4m/vulkan-host-pipeline-barriers

Conversation

@0cc4m

@0cc4m 0cc4m commented May 27, 2026

Copy link
Copy Markdown
Contributor

Overview

Insert Vulkan pipeline barriers before/after memcpy operations for host-visible memory. This resolves the Intel read issue in #22930, and I also added the opposite direction barriers for writes. I am not sure if this is the best way. @jeffbolznv @rillomas

Requirements

@0cc4m 0cc4m requested a review from a team as a code owner May 27, 2026 08:46
@github-actions github-actions Bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels May 27, 2026
Comment thread ggml/src/ggml-vulkan/ggml-vulkan.cpp Outdated
Comment thread ggml/src/ggml-vulkan/ggml-vulkan.cpp
@0cc4m 0cc4m force-pushed the 0cc4m/vulkan-host-pipeline-barriers branch from 0a1ebc5 to d998fbf Compare June 12, 2026 07:11
@0cc4m

0cc4m commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

I forgot to push this further, let's get it done now. @jeffbolznv This may not be all barriers that are needed, but for now it should unblock the two PRs depending on it. Any concerns?

@jeffbolznv

Copy link
Copy Markdown
Contributor

Have somebody done perf testing to make sure this doesn't add a bunch of overhead? I'm less worried about it since it's limited to uma devices. It would be nice to check perf on dgx spark but I don't have access to one. But I'll +1 it to unblock things.

@0cc4m

0cc4m commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

Good idea, here's Qwen on DGX Spark with this PR, #24326 and #22930:

model size params ngl fa mmap test t/s (just this PR) t/s (before) t/s (after) diff
qwen35moe 35B.A3B Q6_K 27.98 GiB 34.66 B -1 1 0 pp2048 1875.01 ± 15.53 1894.19 ± 6.07 1912.53 ± 9.98 +1.0%
qwen35moe 35B.A3B Q6_K 27.98 GiB 34.66 B -1 1 0 tg128 49.80 ± 0.31 47.76 ± 0.31 58.32 ± 0.09 +22.1%
qwen35moe 35B.A3B Q6_K 27.98 GiB 34.66 B -1 1 0 pp2048 @ d8192 1728.37 ± 2.84 1722.12 ± 8.93 1770.81 ± 10.26 +2.8%
qwen35moe 35B.A3B Q6_K 27.98 GiB 34.66 B -1 1 0 tg128 @ d8192 48.42 ± 0.31 46.25 ± 0.55 55.82 ± 0.13 +20.7%
qwen35 27B Q6_K 21.62 GiB 26.90 B -1 1 0 pp2048 603.35 ± 1.01 601.70 ± 1.66 602.84 ± 1.83 +0.2%
qwen35 27B Q6_K 21.62 GiB 26.90 B -1 1 0 tg128 8.28 ± 0.00 8.24 ± 0.01 8.52 ± 0.00 +3.4%
qwen35 27B Q6_K 21.62 GiB 26.90 B -1 1 0 pp2048 @ d8192 579.62 ± 1.57 576.50 ± 1.06 577.45 ± 1.55 +0.2%
qwen35 27B Q6_K 21.62 GiB 26.90 B -1 1 0 tg128 @ d8192 8.03 ± 0.01 8.03 ± 0.01 8.37 ± 0.00 +4.2%

And the same on AMD Strix Halo:

model size params ngl fa mmap test t/s (just this PR) t/s (before) t/s (after) diff
qwen35moe 35B.A3B Q6_K 27.98 GiB 34.66 B -1 1 0 pp2048 735.55 ± 6.47 732.68 ± 15.74 742.65 ± 29.44 +1.4%
qwen35moe 35B.A3B Q6_K 27.98 GiB 34.66 B -1 1 0 tg128 54.18 ± 0.25 54.12 ± 0.22 54.09 ± 0.06 -0.1%
qwen35moe 35B.A3B Q6_K 27.98 GiB 34.66 B -1 1 0 pp2048 @ d8192 621.67 ± 4.45 622.12 ± 3.46 626.14 ± 4.56 +0.6%
qwen35moe 35B.A3B Q6_K 27.98 GiB 34.66 B -1 1 0 tg128 @ d8192 50.94 ± 0.31 51.13 ± 0.26 51.95 ± 0.17 +1.6%
qwen35 27B Q6_K 21.62 GiB 26.90 B -1 1 0 pp2048 164.34 ± 0.85 166.25 ± 0.79 170.95 ± 0.09 +2.8%
qwen35 27B Q6_K 21.62 GiB 26.90 B -1 1 0 tg128 8.90 ± 0.00 8.91 ± 0.01 9.00 ± 0.01 +1.0%
qwen35 27B Q6_K 21.62 GiB 26.90 B -1 1 0 pp2048 @ d8192 146.98 ± 0.16 146.54 ± 0.28 150.10 ± 0.43 +2.4%
qwen35 27B Q6_K 21.62 GiB 26.90 B -1 1 0 tg128 @ d8192 8.66 ± 0.01 8.67 ± 0.01 8.74 ± 0.01 +0.8%

@0cc4m

0cc4m commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

@ggml-org/maintainers Another approval needed.

@0cc4m 0cc4m changed the title vulkan: add pipeline barriers for memcpy read/write operations vulkan: add pipeline barriers for memcpy read operations Jun 12, 2026
@0cc4m 0cc4m merged commit 3e7bd4f into master Jun 12, 2026
30 of 31 checks passed
@0cc4m 0cc4m deleted the 0cc4m/vulkan-host-pipeline-barriers branch June 12, 2026 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants