vulkan: change graph_compute to be async and enable get_tensor_async by jeffbolznv · Pull Request #17158 · ggml-org/llama.cpp

jeffbolznv · 2025-11-10T23:53:57Z

This allows some additional CPU/GPU overlap for large pp workloads. Also seems to help a bit for token gen, maybe getting rid of a small bubble between graph_compute and get_tensor.

Async set and copy functions seem to be very rarely used, so I didn't enable them because I didn't have a good way to test them.

The async commands need to be ordered against each other, so put them all on the compute queue. The non-async commands still use the transfer queue.

The fence for graph_compute/get_tensor_async is submitted and waited on in ggml_vk_synchronize.

See #17033 (comment).

This allows some additional CPU/GPU overlap for large pp workloads. Also seems to help a bit for token gen, maybe getting rid of a small bubble between graph_compute and get_tensor. Async set and copy functions seem to be very rarely used, so I didn't enable them because I didn't have a good way to test them. The async commands need to be ordered against each other, so put them all on the compute queue. The non-async commands still use the transfer queue. The fence for graph_compute/get_tensor_async is submitted and waited on in ggml_vk_synchronize.

0cc4m

I can't see any performance differences, but no issues either.

h9j6k · 2025-11-16T13:39:09Z

I can't see any performance differences, but no issues either.

Commit 38eaf32 makes my intel DG1 vomit gibberish. Will you be able to have another look? Thanks.

0cc4m · 2025-11-16T14:24:59Z

Are you sure it's not just #17106?

h9j6k · 2025-11-16T14:38:51Z

I git pull'ed this morning then it went crazy on 416e7c7. So I rolled back all those vulkan related commits one by one until the one right before 38eaf32, it became ok. Then I fast forward'ed to 416e7c7 again and simply git revert 38eaf32, it was still ok. So I think 38eaf32 is the issue?

jeffbolznv · 2025-11-16T14:41:59Z

Please file an issue for this and share more details. What were you running? Does test-backend-ops pass?

h9j6k · 2025-11-16T15:28:27Z

Please file an issue for this and share more details. What were you running? Does test-backend-ops pass?

Thanks. I filed an issue here #17302

…gml-org#17158) * vulkan: change graph_compute to be async and enable get_tensor_async This allows some additional CPU/GPU overlap for large pp workloads. Also seems to help a bit for token gen, maybe getting rid of a small bubble between graph_compute and get_tensor. Async set and copy functions seem to be very rarely used, so I didn't enable them because I didn't have a good way to test them. The async commands need to be ordered against each other, so put them all on the compute queue. The non-async commands still use the transfer queue. The fence for graph_compute/get_tensor_async is submitted and waited on in ggml_vk_synchronize. * fix thread safety errors * teardown context cleanly * Handle async read to non-pinned dst

…(#17158) * vulkan: change graph_compute to be async and enable get_tensor_async This allows some additional CPU/GPU overlap for large pp workloads. Also seems to help a bit for token gen, maybe getting rid of a small bubble between graph_compute and get_tensor. Async set and copy functions seem to be very rarely used, so I didn't enable them because I didn't have a good way to test them. The async commands need to be ordered against each other, so put them all on the compute queue. The non-async commands still use the transfer queue. The fence for graph_compute/get_tensor_async is submitted and waited on in ggml_vk_synchronize. * fix thread safety errors * teardown context cleanly * Handle async read to non-pinned dst

jeffbolznv requested a review from 0cc4m as a code owner November 10, 2025 23:53

DajanaV mentioned this pull request Nov 11, 2025

UPSTREAM PR #17158: vulkan: change graph_compute to be async and enable get_tensor_async auroralabs-loci/llama.cpp#164

Open

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Nov 11, 2025

jeffbolznv added 3 commits November 10, 2025 20:24

fix thread safety errors

60bc85c

teardown context cleanly

924df57

jeffbolznv force-pushed the graph_compute_async branch from a66edc0 to 924df57 Compare November 11, 2025 03:19

Handle async read to non-pinned dst

3343b60

0cc4m approved these changes Nov 15, 2025

View reviewed changes

0cc4m merged commit 38eaf32 into ggml-org:master Nov 15, 2025
71 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: change graph_compute to be async and enable get_tensor_async#17158

vulkan: change graph_compute to be async and enable get_tensor_async#17158
0cc4m merged 4 commits intoggml-org:masterfrom
jeffbolznv:graph_compute_async

jeffbolznv commented Nov 10, 2025

Uh oh!

0cc4m left a comment

Uh oh!

Uh oh!

h9j6k commented Nov 16, 2025

Uh oh!

0cc4m commented Nov 16, 2025

Uh oh!

h9j6k commented Nov 16, 2025

Uh oh!

jeffbolznv commented Nov 16, 2025

Uh oh!

h9j6k commented Nov 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jeffbolznv commented Nov 10, 2025

Uh oh!

0cc4m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

h9j6k commented Nov 16, 2025

Uh oh!

0cc4m commented Nov 16, 2025

Uh oh!

h9j6k commented Nov 16, 2025

Uh oh!

jeffbolznv commented Nov 16, 2025

Uh oh!

h9j6k commented Nov 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants