vulkan: set all memory allocations to high priority by jeffbolznv · Pull Request #17624 · ggml-org/llama.cpp

jeffbolznv · 2025-11-30T16:29:06Z

For #17605, though I'm not sure whether it'll help.

0cc4m · 2025-11-30T17:43:22Z

Even if it does, I don't think it's always preferable to keep models in VRAM, over other data. So at the very least we'd have to make it configurable.

@netrunnereve do you know if the extension makes a difference for RADV memory eviction behaviour? Maybe it's a way to keep models loaded without the RADV-side flag.

netrunnereve · 2025-12-01T03:27:06Z

Even if it does, I don't think it's always preferable to keep models in VRAM, over other data. So at the very least we'd have to make it configurable.

@netrunnereve do you know if the extension makes a difference for RADV memory eviction behaviour? Maybe it's a way to keep models loaded without the RADV-side flag.

I'm pretty sure RADV doesn't care about this extension and just handles memory like it usually does. By that I mean that it tries to put stuff in vram if possible (note that doesn't always happen) and if memory gets swapped out of vram it doesn't know how to bring it back.

The only way to deal with this is to use my nogttspill flag and #17605 is literally a textbook example of why we need it. I suppose we could ask Mesa to see if they're interested in supporting VK_EXT_memory_priority though and basically have it disable GTT allocations if set high enough.

jeffbolznv · 2025-12-01T03:43:07Z

I'm not too surprised this didn't help, and am OK with abandoning it. This does get hooked up to WDDM priorities on windows and it might help more there.

netrunnereve · 2025-12-01T15:29:21Z

I'm not too surprised this didn't help, and am OK with abandoning it. This does get hooked up to WDDM priorities on windows and it might help more there.

Personally I don't mind this as long as it's optional and disabled by default. Like you said there are other systems which can probably make use of this.

0cc4m · 2025-12-02T09:50:00Z

Since you already implemented it, let's just keep it disabled by default and enable with an environment variable.

jeffbolznv · 2025-12-02T15:42:00Z

Added the env var.

* vulkan: set all memory allocations to high priority * gate by env var

vulkan: set all memory allocations to high priority

b33c590

jeffbolznv requested a review from 0cc4m as a code owner November 30, 2025 16:29

jeffbolznv marked this pull request as draft November 30, 2025 16:29

jeffbolznv mentioned this pull request Nov 30, 2025

Feature Request: Add VK_EXT_memory_priority support for model allocations (Vulkan backend) #17605

Closed

4 tasks

loci-dev mentioned this pull request Nov 30, 2025

UPSTREAM PR #17624: vulkan: set all memory allocations to high priority auroralabs-loci/llama.cpp#373

Open

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Nov 30, 2025

gate by env var

0ba4fee

jeffbolznv marked this pull request as ready for review December 2, 2025 15:42

0cc4m approved these changes Dec 5, 2025

View reviewed changes

0cc4m merged commit 93bb926 into ggml-org:master Dec 5, 2025
70 of 74 checks passed

JayZenith pushed a commit to JayZenith/llama.cpp that referenced this pull request Dec 7, 2025

vulkan: set all memory allocations to high priority (ggml-org#17624)

4bbe266

* vulkan: set all memory allocations to high priority * gate by env var

gabe-l-hart mentioned this pull request Dec 10, 2025

feat: llama.cpp bump (17f7f4) for SSM performance improvements ollama/ollama#13408

Merged

0Marble pushed a commit to 0Marble/llama.cpp that referenced this pull request Dec 18, 2025

vulkan: set all memory allocations to high priority (ggml-org#17624)

1bc4fff

* vulkan: set all memory allocations to high priority * gate by env var

NeoZhangJianyu mentioned this pull request Dec 30, 2025

Adding --direct-io flag for model loading #18166

Merged

Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026

vulkan: set all memory allocations to high priority (ggml-org#17624)

148b0bf

* vulkan: set all memory allocations to high priority * gate by env var

blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026

vulkan: set all memory allocations to high priority (#17624)

71f7eb5

* vulkan: set all memory allocations to high priority * gate by env var

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: set all memory allocations to high priority#17624

vulkan: set all memory allocations to high priority#17624
0cc4m merged 2 commits intoggml-org:masterfrom
jeffbolznv:priority

jeffbolznv commented Nov 30, 2025

Uh oh!

0cc4m commented Nov 30, 2025

Uh oh!

netrunnereve commented Dec 1, 2025

Uh oh!

jeffbolznv commented Dec 1, 2025

Uh oh!

netrunnereve commented Dec 1, 2025

Uh oh!

0cc4m commented Dec 2, 2025

Uh oh!

jeffbolznv commented Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jeffbolznv commented Nov 30, 2025

Uh oh!

0cc4m commented Nov 30, 2025

Uh oh!

netrunnereve commented Dec 1, 2025

Uh oh!

jeffbolznv commented Dec 1, 2025

Uh oh!

netrunnereve commented Dec 1, 2025

Uh oh!

0cc4m commented Dec 2, 2025

Uh oh!

jeffbolznv commented Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants