Skip to content

vulkan: set all memory allocations to high priority#17624

Merged
0cc4m merged 2 commits intoggml-org:masterfrom
jeffbolznv:priority
Dec 5, 2025
Merged

vulkan: set all memory allocations to high priority#17624
0cc4m merged 2 commits intoggml-org:masterfrom
jeffbolznv:priority

Conversation

@jeffbolznv
Copy link
Collaborator

For #17605, though I'm not sure whether it'll help.

@0cc4m
Copy link
Collaborator

0cc4m commented Nov 30, 2025

Even if it does, I don't think it's always preferable to keep models in VRAM, over other data. So at the very least we'd have to make it configurable.

@netrunnereve do you know if the extension makes a difference for RADV memory eviction behaviour? Maybe it's a way to keep models loaded without the RADV-side flag.

@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Nov 30, 2025
@netrunnereve
Copy link
Collaborator

Even if it does, I don't think it's always preferable to keep models in VRAM, over other data. So at the very least we'd have to make it configurable.

@netrunnereve do you know if the extension makes a difference for RADV memory eviction behaviour? Maybe it's a way to keep models loaded without the RADV-side flag.

I'm pretty sure RADV doesn't care about this extension and just handles memory like it usually does. By that I mean that it tries to put stuff in vram if possible (note that doesn't always happen) and if memory gets swapped out of vram it doesn't know how to bring it back.

The only way to deal with this is to use my nogttspill flag and #17605 is literally a textbook example of why we need it. I suppose we could ask Mesa to see if they're interested in supporting VK_EXT_memory_priority though and basically have it disable GTT allocations if set high enough.

@jeffbolznv
Copy link
Collaborator Author

I'm not too surprised this didn't help, and am OK with abandoning it. This does get hooked up to WDDM priorities on windows and it might help more there.

@netrunnereve
Copy link
Collaborator

I'm not too surprised this didn't help, and am OK with abandoning it. This does get hooked up to WDDM priorities on windows and it might help more there.

Personally I don't mind this as long as it's optional and disabled by default. Like you said there are other systems which can probably make use of this.

@0cc4m
Copy link
Collaborator

0cc4m commented Dec 2, 2025

Since you already implemented it, let's just keep it disabled by default and enable with an environment variable.

@jeffbolznv
Copy link
Collaborator Author

Added the env var.

@jeffbolznv jeffbolznv marked this pull request as ready for review December 2, 2025 15:42
@0cc4m 0cc4m merged commit 93bb926 into ggml-org:master Dec 5, 2025
70 of 74 checks passed
JayZenith pushed a commit to JayZenith/llama.cpp that referenced this pull request Dec 7, 2025
* vulkan: set all memory allocations to high priority

* gate by env var
0Marble pushed a commit to 0Marble/llama.cpp that referenced this pull request Dec 18, 2025
* vulkan: set all memory allocations to high priority

* gate by env var
Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026
* vulkan: set all memory allocations to high priority

* gate by env var
blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026
* vulkan: set all memory allocations to high priority

* gate by env var
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants