Skip to content

vulkan: Fix ErrorOutOfHostMemory on Intel GPU when loading large models with --no-mmap#20059

Merged
0cc4m merged 10 commits intoggml-org:masterfrom
rillomas:fix-async-tensor-crash
Mar 12, 2026
Merged

vulkan: Fix ErrorOutOfHostMemory on Intel GPU when loading large models with --no-mmap#20059
0cc4m merged 10 commits intoggml-org:masterfrom
rillomas:fix-async-tensor-crash

Conversation

@rillomas
Copy link
Contributor

@rillomas rillomas commented Mar 3, 2026

Fixes #19420.

Overview

We were hitting an internal maximum number (16383) of command buffers for Intel's Windows GPU driver causing ErrorOutOfHostMemory when loading large models (1MB per transfer * 16383 == approx 16GB or more weight). This PR attempts to fix this by reusing command buffers that are done transferring data.

Test Results

  • llama-cli.exe -m Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf --no-mmap show no crashing on both Intel iGPU and NVIDIA dGPU. Chat results are correct as well.
  • All test-backend-ops.exe pass on both MTL iGPU and NVIDIA dGPU
  • We do not see any Vulkan validation errors when executing test command on both MTL iGPU and NVIDIA dGPU

Benchmark Results

  • We see no significant performance change on MTL iGPU
  • NVIDIA GPU benchmark show lots of variances (due to some layers left on CPU?) and difficult to say about the results. May need someone to double check.

Test environment

  • model: https://huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF/tree/main
  • command: llama-cli.exe -m Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf --no-mmap -> ask tell me your name
  • MTL U9-185H + GeForce RTX 4090 Laptop GPU (Win 11 25H2, Intel GPU driver 32.0.101.8531, NVIDIA GPU driver 32.0.15.9174, 64GB RAM), Build using Visual Studio 2026 18.3.2
  • Could not find sufficient environment with AMD GPU

Test Results (498ff28)

test-backend-ops log (partial)

Intel, NVIDIA results (Windows)
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
ggml_vulkan: 1 = NVIDIA GeForce RTX 4090 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
Testing 3 devices

Backend 1/3: Vulkan0
  Device description: Intel(R) Arc(TM) Graphics
  Device memory: 37146 MB (36377 MB free)

  ABS(type=f16,ne_a=[128,2,2,2],v=0): �[1;32mOK�[0m
...
  GATED_DELTA_NET(type=f32,head_count=4,head_size=64,n_seq_tokens=4,n_seqs=2,v_repeat=1,permuted=1,kda=1): not supported [Vulkan0] 
  11613/11613 tests passed
  Backend Vulkan0: �[1;32mOK�[0m
Backend 2/3: Vulkan1
  Device description: NVIDIA GeForce RTX 4090 Laptop GPU
  Device memory: 16048 MB (15276 MB free)

  ABS(type=f16,ne_a=[128,2,2,2],v=0): �[1;32mOK�[0m
...
  GATED_DELTA_NET(type=f32,head_count=4,head_size=64,n_seq_tokens=4,n_seqs=2,v_repeat=1,permuted=1,kda=1): not supported [Vulkan1] 
  11613/11613 tests passed
  Backend Vulkan1: �[1;32mOK�[0m
Backend 3/3: CPU
  Skipping CPU backend
3/3 backends passed
�[1;32mOK�[0m

llama-cli logs with validation ON

MTL iGPU log
λ build_vk_validation\\bin\Release\llama-cli.exe  -m ..\..\Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf -dev Vulkan0 --no-mmap
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
ggml_vulkan: 1 = NVIDIA GeForce RTX 4090 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2

Loading model... -Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_EXT_pipeline_robustness, but this extension has been promoted to 1.4.0 (0x00404000).
Objects: 1
    [0] VkInstance 0x22b08560cf0

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_maintenance4, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1
    [0] VkInstance 0x22b08560cf0

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_shader_integer_dot_product, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1
    [0] VkInstance 0x22b08560cf0

Validation Warning: [ BestPractices-specialuse-extension ] | MessageID = 0x675dc32e
vkCreateDevice(): Attempting to enable extension VK_KHR_pipeline_executable_properties, but this extension is intended to support developer tools such as capture-replay libraries and it is strongly recommended that it be otherwise avoided.
Objects: 1
    [0] VkInstance 0x22b08560cf0

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_EXT_subgroup_size_control, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1
    [0] VkInstance 0x22b08560cf0

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_16bit_storage, but this extension has been promoted to 1.1.0 (0x00401000).
Objects: 1
    [0] VkInstance 0x22b08560cf0

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_shader_non_semantic_info, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1                                                                                                                                    \
    [0] VkInstance 0x22b08560cf0
Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_shader_float16_int8, but this extension has been promoted to 1.2.0 (0x00402000).
Objects: 1
    [0] VkInstance 0x22b08560cf0
                                                                                                                                              |Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x22b4d524018 sets event VkEvent 0x390000000039 which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x22b4d524018
    [1] VkEvent 0x390000000039

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x22b4d7a6018 sets event VkEvent 0x3d000000003d which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x22b4d7a6018
    [1] VkEvent 0x3d000000003d

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x22b520da018 sets event VkEvent 0x410000000041 which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x22b520da018
    [1] VkEvent 0x410000000041

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x22b4da71018 sets event VkEvent 0x450000000045 which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x22b4da71018
    [1] VkEvent 0x450000000045

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkCmdSetEvent(): commandBuffer VkCommandBuffer 0x22b4d524018 sets event VkEvent 0x390000000039 which was already set (in this command buffer or in the executed secondary command buffers). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x22b4d524018
    [1] VkEvent 0x390000000039

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x22b4d524018 sets event VkEvent 0x390000000039 which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x22b4d524018
    [1] VkEvent 0x390000000039

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkCmdSetEvent(): commandBuffer VkCommandBuffer 0x22b4d7a6018 sets event VkEvent 0x3d000000003d which was already set (in this command buffer or in the executed secondary command buffers). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x22b4d7a6018
    [1] VkEvent 0x3d000000003d

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x22b4d7a6018 sets event VkEvent 0x3d000000003d which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x22b4d7a6018
    [1] VkEvent 0x3d000000003d

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkCmdSetEvent(): commandBuffer VkCommandBuffer 0x22b520da018 sets event VkEvent 0x410000000041 which was already set (in this command buffer or in the executed secondary command buffers). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x22b520da018
    [1] VkEvent 0x410000000041

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
(Warning - This VUID has now been reported 10 times, which is the duplicated_message_limit value, this will be the last time reporting it).
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x22b520da018 sets event VkEvent 0x410000000041 which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x22b520da018
    [1] VkEvent 0x410000000041
                                                                                                                                              \Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.
                                                                                                                                              |Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
(Warning - This VUID has now been reported 10 times, which is the duplicated_message_limit value, this will be the last time reporting it).
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.



▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8274-498ff284d
model      : Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> tell me your name

My name is Qwen! I'm a large-scale language model developed by Tongyi Lab. How can I assist you today?

[ Prompt: 2.8 t/s | Generation: 21.4 t/s ]

> /exit


Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB]                  | total   free     self   model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - Vulkan0 (Intel(R) Arc(TM) Graphics) | 37146 =  212 + (35271 = 17524 +   17184 +     563) +        1662 |
llama_memory_breakdown_print: |   - Host                                |                   532 =   166 +       0 +     366                |
RTX 4090 Laptop GPU log
λ build_vk_validation\\bin\Release\llama-cli.exe  -m ..\..\Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf -dev Vulkan1 --no-mmap
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
ggml_vulkan: 1 = NVIDIA GeForce RTX 4090 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2

Loading model... \Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_EXT_pipeline_robustness, but this extension has been promoted to 1.4.0 (0x00404000).
Objects: 1
    [0] VkInstance 0x1670c12b190

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_maintenance4, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1
    [0] VkInstance 0x1670c12b190

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_shader_integer_dot_product, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1
    [0] VkInstance 0x1670c12b190

Validation Warning: [ BestPractices-specialuse-extension ] | MessageID = 0x675dc32e
vkCreateDevice(): Attempting to enable extension VK_KHR_pipeline_executable_properties, but this extension is intended to support developer tools such as capture-replay libraries and it is strongly recommended that it be otherwise avoided.
Objects: 1
    [0] VkInstance 0x1670c12b190

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_EXT_subgroup_size_control, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1
    [0] VkInstance 0x1670c12b190

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_16bit_storage, but this extension has been promoted to 1.1.0 (0x00401000).
Objects: 1
    [0] VkInstance 0x1670c12b190

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_shader_non_semantic_info, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1
    [0] VkInstance 0x1670c12b190

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_shader_float16_int8, but this extension has been promoted to 1.2.0 (0x00402000).
Objects: 1
    [0] VkInstance 0x1670c12b190

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_EXT_pipeline_robustness, but this extension has been promoted to 1.4.0 (0x00404000).
Objects: 1
    [0] VkInstance 0x1670c12b190

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_maintenance4, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1
    [0] VkInstance 0x1670c12b190

Validation Warning: [ BestPractices-deprecated-extension ] | MessageID = 0xda8260ba
(Warning - This VUID has now been reported 10 times, which is the duplicated_message_limit value, this will be the last time reporting it).
vkCreateDevice(): Attempting to enable deprecated extension VK_KHR_shader_integer_dot_product, but this extension has been promoted to 1.3.0 (0x00403000).
Objects: 1
    [0] VkInstance 0x1670c12b190

Validation Warning: [ BestPractices-specialuse-extension ] | MessageID = 0x675dc32e
vkCreateDevice(): Attempting to enable extension VK_KHR_pipeline_executable_properties, but this extension is intended to support developer tools such as capture-replay libraries and it is strongly recommended that it be otherwise avoided.
Objects: 1
    [0] VkInstance 0x1670c12b190
                                                                                                                                              /Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x167515bf660 sets event VkEvent 0x630000000063 which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x167515bf660
    [1] VkEvent 0x630000000063

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x167551e4080 sets event VkEvent 0x670000000067 which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x167551e4080
    [1] VkEvent 0x670000000067

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x16755c41b30 sets event VkEvent 0x6b000000006b which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x16755c41b30
    [1] VkEvent 0x6b000000006b

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x16757155070 sets event VkEvent 0x6f000000006f which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x16757155070
    [1] VkEvent 0x6f000000006f

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x167515bf660 sets event VkEvent 0x630000000063 which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x167515bf660
    [1] VkEvent 0x630000000063

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkCmdSetEvent(): commandBuffer VkCommandBuffer 0x167515bf660 sets event VkEvent 0x630000000063 which was already set (in this command buffer or in the executed secondary command buffers). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x167515bf660
    [1] VkEvent 0x630000000063

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x167515bf660 sets event VkEvent 0x630000000063 which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x167515bf660
    [1] VkEvent 0x630000000063

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x167551e4080 sets event VkEvent 0x670000000067 which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x167551e4080
    [1] VkEvent 0x670000000067

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
vkCmdSetEvent(): commandBuffer VkCommandBuffer 0x167551e4080 sets event VkEvent 0x670000000067 which was already set (in this command buffer or in the executed secondary command buffers). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x167551e4080
    [1] VkEvent 0x670000000067

Validation Warning: [ BestPractices-Event-SignalSignaledEvent ] | MessageID = 0x8302d873
(Warning - This VUID has now been reported 10 times, which is the duplicated_message_limit value, this will be the last time reporting it).
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x167551e4080 sets event VkEvent 0x670000000067 which is already in the signaled state (set by previously submitted command buffers or from the host). If this is not the desired behavior, the event must be reset before it is set again.
Objects: 2
    [0] VkCommandBuffer 0x167551e4080
    [1] VkEvent 0x670000000067
                                                                                                                                              |Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.
Objects: 1
    [0] VkDevice 0x16757687f40

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.
Objects: 1
    [0] VkDevice 0x16757687f40

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.
Objects: 1
    [0] VkDevice 0x16757687f40

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.
Objects: 1
    [0] VkDevice 0x16757687f40
                                                                                                                                              /Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.
Objects: 1
    [0] VkDevice 0x16757687f40

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.
Objects: 1
    [0] VkDevice 0x16757687f40

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.
Objects: 1
    [0] VkDevice 0x16757687f40

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.
Objects: 1
    [0] VkDevice 0x16757687f40

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.
Objects: 1
    [0] VkDevice 0x16757687f40

Validation Warning: [ BestPractices-SpirvDeprecated_WorkgroupSize ] | MessageID = 0xd8a870c
(Warning - This VUID has now been reported 10 times, which is the duplicated_message_limit value, this will be the last time reporting it).
vkCreateComputePipelines(): pCreateInfos[0].stage is using the SPIR-V Workgroup built-in which SPIR-V 1.6 deprecated. When using VK_KHR_maintenance4 or Vulkan 1.3+, the new SPIR-V LocalSizeId execution mode should be used instead. This can be done by recompiling your shader and targeting Vulkan 1.3+.
Objects: 1
    [0] VkDevice 0x16757687f40



▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8274-498ff284d
model      : Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> tell me your name

My name is Qwen! I'm a large-scale language model developed by Tongyi Lab. How can I assist you today?

[ Prompt: 28.7 t/s | Generation: 28.4 t/s ]

> /exit


Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB]            | total   free     self   model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - Vulkan1 (RTX 4090 Laptop GPU) | 16048 = 1016 + (14248 = 13563 +     384 +     300) +         783 |
llama_memory_breakdown_print: |   - Host                          |                  4143 =  4127 +       0 +      16                |

llama-bench Results (498ff28)

MTL iGPU

No change in pp512 (45-46 t/s) or tg128 (22-23 t/s) performance. -mmp 0 crashes on b8253 so no data.

Before PR
λ ggml\llama-b8253-bin-win-vulkan-x64\\llama-bench.exe -m Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf -dev Vulkan0 -mmp 1
load_backend: loaded RPC backend from C:\Users\dungeon\Documents\mnakasak\ggml\llama-b8253-bin-win-vulkan-x64\ggml-rpc.dll
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
ggml_vulkan: 1 = NVIDIA GeForce RTX 4090 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
load_backend: loaded Vulkan backend from C:\Users\dungeon\Documents\mnakasak\ggml\llama-b8253-bin-win-vulkan-x64\ggml-vulkan.dll
load_backend: loaded CPU backend from C:\Users\dungeon\Documents\mnakasak\ggml\llama-b8253-bin-win-vulkan-x64\ggml-cpu-alderlake.dll
| model                          |       size |     params | backend    | ngl | dev          |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------ | --------------: | -------------------: |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  99 | Vulkan0      |           pp512 |         46.07 ± 0.21 |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  99 | Vulkan0      |           tg128 |         22.09 ± 0.02 |

build: e8bbc736c (8253)
After PR
λ repo\llama.cpp_mine\build_vk\bin\Release\llama-bench.exe -m Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf -dev Vulkan0 -mmp 0,1
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
ggml_vulkan: 1 = NVIDIA GeForce RTX 4090 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
| model                          |       size |     params | backend    | ngl | dev          | mmap |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------ | ---: | --------------: | -------------------: |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  99 | Vulkan0      |    0 |           pp512 |         45.17 ± 0.33 |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  99 | Vulkan0      |    0 |           tg128 |         23.22 ± 0.06 |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  99 | Vulkan0      |    1 |           pp512 |         45.17 ± 0.37 |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  99 | Vulkan0      |    1 |           tg128 |         23.07 ± 0.19 |

build: 498ff284d (8274)

NVIDIA RTX 4090 Laptop GPU

  • Using -ngl 43 since it crashes with ErrorOutOfDeviceMemory on llama-bench
Before PR (b8253)
λ ggml\llama-b8253-bin-win-vulkan-x64\\llama-bench.exe -m Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf -dev Vulkan1 -ngl 43 -mmp 0,1
load_backend: loaded RPC backend from C:\Users\dungeon\Documents\mnakasak\ggml\llama-b8253-bin-win-vulkan-x64\ggml-rpc.dll
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
ggml_vulkan: 1 = NVIDIA GeForce RTX 4090 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
load_backend: loaded Vulkan backend from C:\Users\dungeon\Documents\mnakasak\ggml\llama-b8253-bin-win-vulkan-x64\ggml-vulkan.dll
load_backend: loaded CPU backend from C:\Users\dungeon\Documents\mnakasak\ggml\llama-b8253-bin-win-vulkan-x64\ggml-cpu-alderlake.dll
| model                          |       size |     params | backend    | ngl | dev          | mmap |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------ | ---: | --------------: | -------------------: |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  43 | Vulkan1      |    0 |           pp512 |        759.44 ± 3.10 |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  43 | Vulkan1      |    0 |           tg128 |         76.03 ± 0.30 |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  43 | Vulkan1      |    1 |           pp512 |       738.68 ± 16.66 |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  43 | Vulkan1      |    1 |           tg128 |         74.78 ± 1.05 |

build: e8bbc736c (8253)
After PR
λ repo\llama.cpp_mine\build_vk\bin\Release\llama-bench.exe -m Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf -dev Vulkan1 -ngl 43 -mmp 0,1
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
ggml_vulkan: 1 = NVIDIA GeForce RTX 4090 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
| model                          |       size |     params | backend    | ngl | dev          | mmap |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------ | ---: | --------------: | -------------------: |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  43 | Vulkan1      |    0 |           pp512 |      624.08 ± 129.97 |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  43 | Vulkan1      |    0 |           tg128 |         72.84 ± 2.64 |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  43 | Vulkan1      |    1 |           pp512 |       726.58 ± 14.08 |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB |    30.53 B | Vulkan     |  43 | Vulkan1      |    1 |           tg128 |         73.03 ± 1.11 |

build: 498ff284d (8274)

AI Disclosure

AI (GPT-5.3-Codex) was used for partial PoC coding, refactoring, and analysis.

@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Mar 3, 2026
@HumerousGorgon
Copy link

Hello! While this does fix model loading with --no-mmap, it results in garbelled outputs.
Screenshot 2026-03-03 at 8 30 27 pm

@rillomas
Copy link
Contributor Author

rillomas commented Mar 4, 2026

Hi @HumerousGorgon thanks for testing. I wasn't sure how you tested this but on my side it is working with d1dd814 using this model. Can you provide more details (how you executed on which environment using what model)?

llama-cli output for Qwen3.5-35B-A3B-Q4_K_M.gguf
λ repo\llama.cpp_mine\build_vk\bin\Release\llama-cli.exe -m Qwen3.5-35B-A3B-Q4_K_M.gguf --no-mmap
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none

Loading model...


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8193-d1dd81479
model      : Qwen3.5-35B-A3B-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> tell me your name

Okay, the user is asking me to tell them my name. I am Qwen3.5, the latest large language model developed by Tongyi Lab. I should provide that information clearly. Let me make sure to state my name accurately. No need for extra details unless they ask. Keep it straightforward. Alright, ready to respond.
</think>

My name is **Qwen3.5**. I am a large language model developed by Tongyi Lab. How can I assist you today?

[ Prompt: 4.6 t/s | Generation: 10.8 t/s ]

@rillomas rillomas marked this pull request as ready for review March 4, 2026 08:17
@rillomas rillomas requested a review from 0cc4m as a code owner March 4, 2026 08:17
@danielmayost
Copy link

Tested on Arrow Lake (140T).

b8191
PS D:\Code\llama> ./b8191/llama-bench.exe -m ./Qwen3.5-35B-A3B-Q4_K_M.gguf -mmp 1
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) 140T GPU (32GB) (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | ngl | mmap |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: |
| qwen35moe ?B Q4_K - Medium     |  19.74 GiB |    34.66 B | Vulkan     |  99 |    1 |           pp512 |        215.68 ± 7.56 |
| qwen35moe ?B Q4_K - Medium     |  19.74 GiB |    34.66 B | Vulkan     |  99 |    1 |           tg128 |         19.18 ± 0.19 |

build: 24350fdf9 (8191)
pr
PS D:\Code\llama> ./b8191-fixed/llama-bench.exe -m ./Qwen3.5-35B-A3B-Q4_K_M.gguf -mmp 0,1
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) 140T GPU (32GB) (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | ngl | mmap |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: |
| qwen35moe ?B Q4_K - Medium     |  19.74 GiB |    34.66 B | Vulkan     |  99 |    0 |           pp512 |       203.89 ± 16.82 |
| qwen35moe ?B Q4_K - Medium     |  19.74 GiB |    34.66 B | Vulkan     |  99 |    0 |           tg128 |         19.10 ± 0.09 |
| qwen35moe ?B Q4_K - Medium     |  19.74 GiB |    34.66 B | Vulkan     |  99 |    1 |           pp512 |        222.26 ± 7.05 |
| qwen35moe ?B Q4_K - Medium     |  19.74 GiB |    34.66 B | Vulkan     |  99 |    1 |           tg128 |         19.23 ± 0.11 |

build: 29a1a01a9 (8195)

@rillomas rillomas marked this pull request as draft March 9, 2026 05:17
@rillomas
Copy link
Contributor Author

rillomas commented Mar 9, 2026

Currently seeing following validation error on e1f8ce0. Compared to 29a1a01 which was focused on fixing the async transfer part, the new approach seems to have a much wider influence causing issues in other phases. This fix will probably take more time since we need to thoroughly look through the command buffer/fence relation over the entire exection.

Validation Error: [ VUID-vkBeginCommandBuffer-commandBuffer-00049 ] | MessageID = 0x84029a9f
vkBeginCommandBuffer(): on active VkCommandBuffer 0x114f5ce9650 before it has completed. You must check command buffer fence before this call.
The Vulkan spec states: commandBuffer must not be in the recording or pending state (https://vulkan.lunarg.com/doc/view/1.4.321.1/windows/antora/spec/latest/chapters/cmdbuffers.html#VUID-vkBeginCommandBuffer-commandBuffer-00049)
Objects: 1
    [0] VkCommandBuffer 0x114f5ce9650

Validation Error: [ VUID-vkQueueSubmit-pCommandBuffers-00071 ] | MessageID = 0x2e2f4d65
vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x114f5ce9650 is already in use and is not marked for simultaneous use.
The Vulkan spec states: If any element of the pCommandBuffers member of any element of pSubmits was not recorded with the VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT, it must not be in the pending state (https://vulkan.lunarg.com/doc/view/1.4.321.1/windows/antora/spec/latest/chapters/cmdbuffers.html#VUID-vkQueueSubmit-pCommandBuffers-00071)
Objects: 1
    [0] VkDevice 0x114f765a020

Update: I was able to get this fixed by doing a more consistent reuse in a0fecda

@rillomas rillomas marked this pull request as ready for review March 10, 2026 07:26
Copy link
Contributor

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you

@0cc4m 0cc4m merged commit 5866e3b into ggml-org:master Mar 12, 2026
73 of 78 checks passed
tekintian added a commit to tekintian/llama.cpp that referenced this pull request Mar 12, 2026
* 'master' of github.com:ggml-org/llama.cpp: (33 commits)
  convert : better mtp check and fix return [no ci] (ggml-org#20419)
  vulkan: fix SSM_CONV PP scaling with large ubatch sizes (ggml-org#20379)
  New conversations now auto-select the first loaded model (ggml-org#20403)
  ggml-virtgpu: Fix some build commands (ggml-org#20341)
  metal : avoid divisions in bin kernel (ggml-org#20426)
  ci: Setup self-hosted CI for Intel Linux Vulkan backend (ggml-org#20154)
  vulkan: fix l2_norm epsilon handling (ggml-org#20350)
  vulkan: fix OOB check in flash_attn_mask_opt (ggml-org#20296)
  vulkan: Fix ErrorOutOfHostMemory on Intel GPU when loading large models with --no-mmap (ggml-org#20059)
  opencl: use larger workgroup size for get_rows (ggml-org#20316)
  opencl: add cumsum op (ggml-org#18981)
  hip: compile debug builds with -O2 on hip to avoid a compiler bug (ggml-org#20392)
  common/parser: add GigaChatV3/3.1 models support (ggml-org#19931)
  model : add support for Phi4ForCausalLMV (ggml-org#20168)
  graph : add optional scale parameter to build_lora_mm [no ci] (ggml-org#20427)
  common : fix --n-cpu-moe, --cpu-moe for models with fused gate + up (ggml-org#20416)
  ggml-webgpu: Add supports for `GGML_OP_REPEAT` (ggml-org#20230)
  llama : enable chunked fused GDN path (ggml-org#20340)
  llama : whitespace cleanup (ggml-org#20422)
  ggml : add NVFP4 quantization type support (ggml-org#19769)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: Qwen3-Coder-30B-A3B-Instruct crash in vulkan Intel ARL

4 participants