vulkan: Fix ErrorOutOfHostMemory on Intel GPU when loading large models with --no-mmap by rillomas · Pull Request #20059 · ggml-org/llama.cpp

rillomas · 2026-03-03T07:56:45Z

Fixes #19420.

Overview

We were hitting an internal maximum number (16383) of command buffers for Intel's Windows GPU driver causing ErrorOutOfHostMemory when loading large models (1MB per transfer * 16383 == approx 16GB or more weight). This PR attempts to fix this by reusing command buffers that are done transferring data.

Test Results

llama-cli.exe -m Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf --no-mmap show no crashing on both Intel iGPU and NVIDIA dGPU. Chat results are correct as well.
All test-backend-ops.exe pass on both MTL iGPU and NVIDIA dGPU
We do not see any Vulkan validation errors when executing test command on both MTL iGPU and NVIDIA dGPU

Benchmark Results

We see no significant performance change on MTL iGPU
NVIDIA GPU benchmark show lots of variances (due to some layers left on CPU?) and difficult to say about the results. May need someone to double check.

Test environment

model: https://huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF/tree/main
command: llama-cli.exe -m Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf --no-mmap -> ask tell me your name
MTL U9-185H + GeForce RTX 4090 Laptop GPU (Win 11 25H2, Intel GPU driver 32.0.101.8531, NVIDIA GPU driver 32.0.15.9174, 64GB RAM), Build using Visual Studio 2026 18.3.2
Could not find sufficient environment with AMD GPU

Test Results (498ff28)

test-backend-ops log (partial)

Intel, NVIDIA results (Windows)

ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
ggml_vulkan: 1 = NVIDIA GeForce RTX 4090 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
Testing 3 devices

Backend 1/3: Vulkan0
  Device description: Intel(R) Arc(TM) Graphics
  Device memory: 37146 MB (36377 MB free)

  ABS(type=f16,ne_a=[128,2,2,2],v=0): �[1;32mOK�[0m
...
  GATED_DELTA_NET(type=f32,head_count=4,head_size=64,n_seq_tokens=4,n_seqs=2,v_repeat=1,permuted=1,kda=1): not supported [Vulkan0] 
  11613/11613 tests passed
  Backend Vulkan0: �[1;32mOK�[0m
Backend 2/3: Vulkan1
  Device description: NVIDIA GeForce RTX 4090 Laptop GPU
  Device memory: 16048 MB (15276 MB free)

  ABS(type=f16,ne_a=[128,2,2,2],v=0): �[1;32mOK�[0m
...
  GATED_DELTA_NET(type=f32,head_count=4,head_size=64,n_seq_tokens=4,n_seqs=2,v_repeat=1,permuted=1,kda=1): not supported [Vulkan1] 
  11613/11613 tests passed
  Backend Vulkan1: �[1;32mOK�[0m
Backend 3/3: CPU
  Skipping CPU backend
3/3 backends passed
�[1;32mOK�[0m

llama-cli logs with validation ON

MTL iGPU log

λ build_vk_validation\\bin\Release\llama-cli.exe  -m ..\..\Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf -dev Vulkan0 --no-mmap
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
ggml_vulkan: 1 = NVIDIA GeForce RTX 4090 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2

Loading model... -Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_EXT_pipeline_robustness, but this extension has been promoted to 1.4.0 (0x00404000).
Objects: 1
    [0] VkInstance 0x22b08560cf0

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_maintenance4, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1
    [0] VkInstance 0x22b08560cf0

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_shader_integer_dot_product, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1
    [0] VkInstance 0x22b08560cf0

Validation Warning: [ BestPractices-specialuse-extension ] | MessageID = 0x675dc32e
vkCreateDevice(): Attempting to enable extension VK_KHR_pipeline_executable_properties, but this extension is intended to support developer tools such as capture-replay libraries and it is strongly recommended that it be otherwise avoided.
Objects: 1
    [0] VkInstance 0x22b08560cf0

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_EXT_subgroup_size_control, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1
    [0] VkInstance 0x22b08560cf0

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_16bit_storage, but this extension has been promoted to 1.1.0 (0x00401000).
Objects: 1
    [0] VkInstance 0x22b08560cf0

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_shader_non_semantic_info, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1                                                                                                                                    \
    [0] VkInstance 0x22b08560cf0
Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_shader_float16_int8, but this extension has been promoted to 1.2.0 (0x00402000).
Objects: 1
    [0] VkInstance 0x22b08560cf0
                                                                                                                                              |Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x22b4d524018 sets event VkEvent 0x390000000039 which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x22b4d524018
    [1] VkEvent 0x390000000039

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x22b4d7a6018 sets event VkEvent 0x3d000000003d which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x22b4d7a6018
    [1] VkEvent 0x3d000000003d

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x22b520da018 sets event VkEvent 0x410000000041 which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x22b520da018
    [1] VkEvent 0x410000000041

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x22b4da71018 sets event VkEvent 0x450000000045 which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x22b4da71018
    [1] VkEvent 0x450000000045

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkCmdSetEvent(): commandBuffer VkCommandBuffer 0x22b4d524018 sets event VkEvent 0x390000000039 which was already set (in this command buffer or in the executed secondary command buffers). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x22b4d524018
    [1] VkEvent 0x390000000039

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x22b4d524018 sets event VkEvent 0x390000000039 which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x22b4d524018
    [1] VkEvent 0x390000000039

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkCmdSetEvent(): commandBuffer VkCommandBuffer 0x22b4d7a6018 sets event VkEvent 0x3d000000003d which was already set (in this command buffer or in the executed secondary command buffers). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x22b4d7a6018
    [1] VkEvent 0x3d000000003d

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x22b4d7a6018 sets event VkEvent 0x3d000000003d which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x22b4d7a6018
    [1] VkEvent 0x3d000000003d

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkCmdSetEvent(): commandBuffer VkCommandBuffer 0x22b520da018 sets event VkEvent 0x410000000041 which was already set (in this command buffer or in the executed secondary command buffers). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x22b520da018
    [1] VkEvent 0x410000000041

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
(Warning - This VUID has now been reported 10 times, which is the duplicated_message_limit value, this will be the last time reporting it).
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x22b520da018 sets event VkEvent 0x410000000041 which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x22b520da018
    [1] VkEvent 0x410000000041
                                                                                                                                              \Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.
                                                                                                                                              |Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
(Warning - This VUID has now been reported 10 times, which is the duplicated_message_limit value, this will be the last time reporting it).
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.



▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8274-498ff284d
model      : Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> tell me your name

My name is Qwen! I'm a large-scale language model developed by Tongyi Lab. How can I assist you today?

[ Prompt: 2.8 t/s | Generation: 21.4 t/s ]

> /exit


Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB]                  | total   free     self   model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - Vulkan0 (Intel(R) Arc(TM) Graphics) | 37146 =  212 + (35271 = 17524 +   17184 +     563) +        1662 |
llama_memory_breakdown_print: |   - Host                                |                   532 =   166 +       0 +     366                |

RTX 4090 Laptop GPU log

λ build_vk_validation\\bin\Release\llama-cli.exe  -m ..\..\Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf -dev Vulkan1 --no-mmap
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
ggml_vulkan: 1 = NVIDIA GeForce RTX 4090 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2

Loading model... \Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_EXT_pipeline_robustness, but this extension has been promoted to 1.4.0 (0x00404000).
Objects: 1
    [0] VkInstance 0x1670c12b190

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_maintenance4, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1
    [0] VkInstance 0x1670c12b190

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_shader_integer_dot_product, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1
    [0] VkInstance 0x1670c12b190

Validation Warning: [ BestPractices-specialuse-extension ] | MessageID = 0x675dc32e
vkCreateDevice(): Attempting to enable extension VK_KHR_pipeline_executable_properties, but this extension is intended to support developer tools such as capture-replay libraries and it is strongly recommended that it be otherwise avoided.
Objects: 1
    [0] VkInstance 0x1670c12b190

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_EXT_subgroup_size_control, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1
    [0] VkInstance 0x1670c12b190

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_16bit_storage, but this extension has been promoted to 1.1.0 (0x00401000).
Objects: 1
    [0] VkInstance 0x1670c12b190

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_shader_non_semantic_info, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1
    [0] VkInstance 0x1670c12b190

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_shader_float16_int8, but this extension has been promoted to 1.2.0 (0x00402000).
Objects: 1
    [0] VkInstance 0x1670c12b190

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_EXT_pipeline_robustness, but this extension has been promoted to 1.4.0 (0x00404000).
Objects: 1
    [0] VkInstance 0x1670c12b190

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_maintenance4, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1
    [0] VkInstance 0x1670c12b190

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
(Warning - This VUID has now been reported 10 times, which is the duplicated_message_limit value, this will be the last time reporting it).
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_shader_integer_dot_product, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1
    [0] VkInstance 0x1670c12b190

Validation Warning: [ BestPractices-specialuse-extension ] | MessageID = 0x675dc32e
vkCreateDevice(): Attempting to enable extension VK_KHR_pipeline_executable_properties, but this extension is intended to support developer tools such as capture-replay libraries and it is strongly recommended that it be otherwise avoided.
Objects: 1
    [0] VkInstance 0x1670c12b190
                                                                                                                                              /Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x167515bf660 sets event VkEvent 0x630000000063 which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x167515bf660
    [1] VkEvent 0x630000000063

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x167551e4080 sets event VkEvent 0x670000000067 which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x167551e4080
    [1] VkEvent 0x670000000067

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x16755c41b30 sets event VkEvent 0x6b000000006b which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x16755c41b30
    [1] VkEvent 0x6b000000006b

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x16757155070 sets event VkEvent 0x6f000000006f which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x16757155070
    [1] VkEvent 0x6f000000006f

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x167515bf660 sets event VkEvent 0x630000000063 which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x167515bf660
    [1] VkEvent 0x630000000063

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkCmdSetEvent(): commandBuffer VkCommandBuffer 0x167515bf660 sets event VkEvent 0x630000000063 which was already set (in this command buffer or in the executed secondary command buffers). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x167515bf660
    [1] VkEvent 0x630000000063

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x167515bf660 sets event VkEvent 0x630000000063 which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x167515bf660
    [1] VkEvent 0x630000000063

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x167551e4080 sets event VkEvent 0x670000000067 which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x167551e4080
    [1] VkEvent 0x670000000067

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkCmdSetEvent(): commandBuffer VkCommandBuffer 0x167551e4080 sets event VkEvent 0x670000000067 which was already set (in this command buffer or in the executed secondary command buffers). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x167551e4080
    [1] VkEvent 0x670000000067

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
(Warning - This VUID has now been reported 10 times, which is the duplicated_message_limit value, this will be the last time reporting it).
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x167551e4080 sets event VkEvent 0x670000000067 which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x167551e4080
    [1] VkEvent 0x670000000067
                                                                                                                                              |Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.
Objects: 1
    [0] VkDevice 0x16757687f40

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.
Objects: 1
    [0] VkDevice 0x16757687f40

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.
Objects: 1
    [0] VkDevice 0x16757687f40

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.
Objects: 1
    [0] VkDevice 0x16757687f40
                                                                                                                                              /Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.
Objects: 1
    [0] VkDevice 0x16757687f40

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.
Objects: 1
    [0] VkDevice 0x16757687f40

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.
Objects: 1
    [0] VkDevice 0x16757687f40

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.
Objects: 1
    [0] VkDevice 0x16757687f40

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.
Objects: 1
    [0] VkDevice 0x16757687f40

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
(Warning - This VUID has now been reported 10 times, which is the duplicated_message_limit value, this will be the last time reporting it).
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.
Objects: 1
    [0] VkDevice 0x16757687f40



▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8274-498ff284d
model      : Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> tell me your name

My name is Qwen! I'm a large-scale language model developed by Tongyi Lab. How can I assist you today?

[ Prompt: 28.7 t/s | Generation: 28.4 t/s ]

> /exit


Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB]            | total   free     self   model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - Vulkan1 (RTX 4090 Laptop GPU) | 16048 = 1016 + (14248 = 13563 +     384 +     300) +         783 |
llama_memory_breakdown_print: |   - Host                          |                  4143 =  4127 +       0 +      16                |

llama-bench Results (498ff28)

MTL iGPU

No change in pp512 (45-46 t/s) or tg128 (22-23 t/s) performance. -mmp 0 crashes on b8253 so no data.

Before PR

λ ggml\llama-b8253-bin-win-vulkan-x64\\llama-bench.exe -m Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf -dev Vulkan0 -mmp 1
load_backend: loaded RPC backend from C:\Users\dungeon\Documents\mnakasak\ggml\llama-b8253-bin-win-vulkan-x64\ggml-rpc.dll
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
ggml_vulkan: 1 = NVIDIA GeForce RTX 4090 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
load_backend: loaded Vulkan backend from C:\Users\dungeon\Documents\mnakasak\ggml\llama-b8253-bin-win-vulkan-x64\ggml-vulkan.dll
load_backend: loaded CPU backend from C:\Users\dungeon\Documents\mnakasak\ggml\llama-b8253-bin-win-vulkan-x64\ggml-cpu-alderlake.dll
| model                          |       size |     params | backend    | ngl | dev          |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------ | --------------: | -------------------: |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  99 | Vulkan0      |           pp512 |         46.07 ± 0.21 |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  99 | Vulkan0      |           tg128 |         22.09 ± 0.02 |

build: e8bbc736c (8253)

After PR

λ repo\llama.cpp_mine\build_vk\bin\Release\llama-bench.exe -m Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf -dev Vulkan0 -mmp 0,1
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
ggml_vulkan: 1 = NVIDIA GeForce RTX 4090 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
| model                          |       size |     params | backend    | ngl | dev          | mmap |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------ | ---: | --------------: | -------------------: |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  99 | Vulkan0      |    0 |           pp512 |         45.17 ± 0.33 |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  99 | Vulkan0      |    0 |           tg128 |         23.22 ± 0.06 |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  99 | Vulkan0      |    1 |           pp512 |         45.17 ± 0.37 |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  99 | Vulkan0      |    1 |           tg128 |         23.07 ± 0.19 |

build: 498ff284d (8274)

NVIDIA RTX 4090 Laptop GPU

Using -ngl 43 since it crashes with ErrorOutOfDeviceMemory on llama-bench

Before PR (b8253)

λ ggml\llama-b8253-bin-win-vulkan-x64\\llama-bench.exe -m Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf -dev Vulkan1 -ngl 43 -mmp 0,1
load_backend: loaded RPC backend from C:\Users\dungeon\Documents\mnakasak\ggml\llama-b8253-bin-win-vulkan-x64\ggml-rpc.dll
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
ggml_vulkan: 1 = NVIDIA GeForce RTX 4090 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
load_backend: loaded Vulkan backend from C:\Users\dungeon\Documents\mnakasak\ggml\llama-b8253-bin-win-vulkan-x64\ggml-vulkan.dll
load_backend: loaded CPU backend from C:\Users\dungeon\Documents\mnakasak\ggml\llama-b8253-bin-win-vulkan-x64\ggml-cpu-alderlake.dll
| model                          |       size |     params | backend    | ngl | dev          | mmap |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------ | ---: | --------------: | -------------------: |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  43 | Vulkan1      |    0 |           pp512 |        759.44 ± 3.10 |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  43 | Vulkan1      |    0 |           tg128 |         76.03 ± 0.30 |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  43 | Vulkan1      |    1 |           pp512 |       738.68 ± 16.66 |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  43 | Vulkan1      |    1 |           tg128 |         74.78 ± 1.05 |

build: e8bbc736c (8253)

After PR

λ repo\llama.cpp_mine\build_vk\bin\Release\llama-bench.exe -m Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf -dev Vulkan1 -ngl 43 -mmp 0,1
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
ggml_vulkan: 1 = NVIDIA GeForce RTX 4090 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
| model                          |       size |     params | backend    | ngl | dev          | mmap |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------ | ---: | --------------: | -------------------: |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  43 | Vulkan1      |    0 |           pp512 |      624.08 ± 129.97 |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  43 | Vulkan1      |    0 |           tg128 |         72.84 ± 2.64 |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  43 | Vulkan1      |    1 |           pp512 |       726.58 ± 14.08 |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  43 | Vulkan1      |    1 |           tg128 |         73.03 ± 1.11 |

build: 498ff284d (8274)

AI Disclosure

AI (GPT-5.3-Codex) was used for partial PoC coding, refactoring, and analysis.

HumerousGorgon · 2026-03-04T01:33:11Z

Hello! While this does fix model loading with --no-mmap, it results in garbelled outputs.

rillomas · 2026-03-04T02:39:17Z

Hi @HumerousGorgon thanks for testing. I wasn't sure how you tested this but on my side it is working with d1dd814 using this model. Can you provide more details (how you executed on which environment using what model)?

llama-cli output for Qwen3.5-35B-A3B-Q4_K_M.gguf

λ repo\llama.cpp_mine\build_vk\bin\Release\llama-cli.exe -m Qwen3.5-35B-A3B-Q4_K_M.gguf --no-mmap
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none

Loading model...


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8193-d1dd81479
model      : Qwen3.5-35B-A3B-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> tell me your name

Okay, the user is asking me to tell them my name. I am Qwen3.5, the latest large language model developed by Tongyi Lab. I should provide that information clearly. Let me make sure to state my name accurately. No need for extra details unless they ask. Keep it straightforward. Alright, ready to respond.
</think>

My name is **Qwen3.5**. I am a large language model developed by Tongyi Lab. How can I assist you today?

[ Prompt: 4.6 t/s | Generation: 10.8 t/s ]

ggml/src/ggml-vulkan/ggml-vulkan.cpp

danielmayost · 2026-03-08T15:06:38Z

Tested on Arrow Lake (140T).

b8191

PS D:\Code\llama> ./b8191/llama-bench.exe -m ./Qwen3.5-35B-A3B-Q4_K_M.gguf -mmp 1
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) 140T GPU (32GB) (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | ngl | mmap |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: |
| qwen35moe ?B Q4_K - Medium     |  19.74 GiB |    34.66 B | Vulkan     |  99 |    1 |           pp512 |        215.68 ± 7.56 |
| qwen35moe ?B Q4_K - Medium     |  19.74 GiB |    34.66 B | Vulkan     |  99 |    1 |           tg128 |         19.18 ± 0.19 |

build: 24350fdf9 (8191)

pr

PS D:\Code\llama> ./b8191-fixed/llama-bench.exe -m ./Qwen3.5-35B-A3B-Q4_K_M.gguf -mmp 0,1
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) 140T GPU (32GB) (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | ngl | mmap |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: |
| qwen35moe ?B Q4_K - Medium     |  19.74 GiB |    34.66 B | Vulkan     |  99 |    0 |           pp512 |       203.89 ± 16.82 |
| qwen35moe ?B Q4_K - Medium     |  19.74 GiB |    34.66 B | Vulkan     |  99 |    0 |           tg128 |         19.10 ± 0.09 |
| qwen35moe ?B Q4_K - Medium     |  19.74 GiB |    34.66 B | Vulkan     |  99 |    1 |           pp512 |        222.26 ± 7.05 |
| qwen35moe ?B Q4_K - Medium     |  19.74 GiB |    34.66 B | Vulkan     |  99 |    1 |           tg128 |         19.23 ± 0.11 |

build: 29a1a01a9 (8195)

rillomas · 2026-03-09T05:19:36Z

Currently seeing following validation error on e1f8ce0. Compared to 29a1a01 which was focused on fixing the async transfer part, the new approach seems to have a much wider influence causing issues in other phases. This fix will probably take more time since we need to thoroughly look through the command buffer/fence relation over the entire exection.

Validation Error: [ VUID-vkBeginCommandBuffer-commandBuffer-00049 ] | MessageID = 0x84029a9f
vkBeginCommandBuffer(): on active VkCommandBuffer 0x114f5ce9650 before it has completed. You must check command buffer fence before this call.
The Vulkan spec states: commandBuffer must not be in the recording or pending state (https://vulkan.lunarg.com/doc/view/1.4.321.1/windows/antora/spec/latest/chapters/cmdbuffers.html#VUID-vkBeginCommandBuffer-commandBuffer-00049)
Objects: 1
    [0] VkCommandBuffer 0x114f5ce9650

Validation Error: [ VUID-vkQueueSubmit-pCommandBuffers-00071 ] | MessageID = 0x2e2f4d65
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x114f5ce9650 is already in use and is not marked for simultaneous use.
The Vulkan spec states: If any element of the pCommandBuffers member of any element of pSubmits was not recorded with the VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT, it must not be in the pending state (https://vulkan.lunarg.com/doc/view/1.4.321.1/windows/antora/spec/latest/chapters/cmdbuffers.html#VUID-vkQueueSubmit-pCommandBuffers-00071)
Objects: 1
    [0] VkDevice 0x114f765a020

Update: I was able to get this fixed by doing a more consistent reuse in a0fecda

0cc4m

LGTM, thank you

* 'master' of github.com:ggml-org/llama.cpp: (33 commits) convert : better mtp check and fix return [no ci] (ggml-org#20419) vulkan: fix SSM_CONV PP scaling with large ubatch sizes (ggml-org#20379) New conversations now auto-select the first loaded model (ggml-org#20403) ggml-virtgpu: Fix some build commands (ggml-org#20341) metal : avoid divisions in bin kernel (ggml-org#20426) ci: Setup self-hosted CI for Intel Linux Vulkan backend (ggml-org#20154) vulkan: fix l2_norm epsilon handling (ggml-org#20350) vulkan: fix OOB check in flash_attn_mask_opt (ggml-org#20296) vulkan: Fix ErrorOutOfHostMemory on Intel GPU when loading large models with --no-mmap (ggml-org#20059) opencl: use larger workgroup size for get_rows (ggml-org#20316) opencl: add cumsum op (ggml-org#18981) hip: compile debug builds with -O2 on hip to avoid a compiler bug (ggml-org#20392) common/parser: add GigaChatV3/3.1 models support (ggml-org#19931) model : add support for Phi4ForCausalLMV (ggml-org#20168) graph : add optional scale parameter to build_lora_mm [no ci] (ggml-org#20427) common : fix --n-cpu-moe, --cpu-moe for models with fused gate + up (ggml-org#20416) ggml-webgpu: Add supports for `GGML_OP_REPEAT` (ggml-org#20230) llama : enable chunked fused GDN path (ggml-org#20340) llama : whitespace cleanup (ggml-org#20422) ggml : add NVFP4 quantization type support (ggml-org#19769) ...

Changed to reuse command buffers to fix crashing on Intel GPU

4b52568

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Mar 3, 2026

Removed unused parameter

d1dd814

rillomas added 2 commits March 3, 2026 22:55

Fixed compile error and minor mistake

668d245

Fix logging

29a1a01

rillomas marked this pull request as ready for review March 4, 2026 08:17

rillomas requested a review from 0cc4m as a code owner March 4, 2026 08:17

0cc4m reviewed Mar 6, 2026

View reviewed changes

ggml/src/ggml-vulkan/ggml-vulkan.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-vulkan/ggml-vulkan.cpp Outdated Show resolved Hide resolved

inforithmics mentioned this pull request Mar 7, 2026

Revert revert vendor update (Vendor Update to b8333) ollama/ollama#14134

Open

5 tasks

danielmayost mentioned this pull request Mar 8, 2026

Eval bug: [Vulkan] [Intel] unsloth/GLM-4.7-Flash-Q4_K_M.gguf and A770 main: error: failed to load model (MMAP off) #19143

Open

rillomas marked this pull request as draft March 9, 2026 05:17

Changing to use usage flag per command buffer

e1f8ce0

rillomas added 4 commits March 8, 2026 22:23

fixed style

19d5483

added buffer reset

ffed7e5

Removed cmd_buffer_idx for reuse consistency

a0fecda

Merge remote-tracking branch 'origin/master' into fix-async-tensor-crash

498ff28

rillomas marked this pull request as ready for review March 10, 2026 07:26

Fixed style

d3fab84

0cc4m approved these changes Mar 12, 2026

View reviewed changes

0cc4m merged commit 5866e3b into ggml-org:master Mar 12, 2026
73 of 78 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: Fix ErrorOutOfHostMemory on Intel GPU when loading large models with --no-mmap#20059

vulkan: Fix ErrorOutOfHostMemory on Intel GPU when loading large models with --no-mmap#20059
0cc4m merged 10 commits intoggml-org:masterfrom
rillomas:fix-async-tensor-crash

rillomas commented Mar 3, 2026 •

edited

Loading

Uh oh!

HumerousGorgon commented Mar 4, 2026

Uh oh!

rillomas commented Mar 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

danielmayost commented Mar 8, 2026

Uh oh!

rillomas commented Mar 9, 2026 •

edited

Loading

Uh oh!

0cc4m left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

rillomas commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Test Results

Benchmark Results

Test environment

Test Results (498ff28)

test-backend-ops log (partial)

llama-cli logs with validation ON

llama-bench Results (498ff28)

MTL iGPU

NVIDIA RTX 4090 Laptop GPU

AI Disclosure

Uh oh!

HumerousGorgon commented Mar 4, 2026

Uh oh!

rillomas commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danielmayost commented Mar 8, 2026

Uh oh!

rillomas commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0cc4m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rillomas commented Mar 3, 2026 •

edited

Loading

rillomas commented Mar 4, 2026 •

edited

Loading

rillomas commented Mar 9, 2026 •

edited

Loading