(gdb) r
Starting program: /home/anholt/src/llama.cpp/build-aarch64/bin/llama-bench -ngl 99 -m /home/anholt/.cache/llama.cpp/ggml-org_Nomic-Embed-Text-V2-GGUF_nomic-embed-text-v2-moe-q8_0.gguf
[...]
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Adreno X1-85 (turnip Mesa driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 128 | shared memory: 32768 | int dot: 1 | matrix cores: none
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
[...]
Validation Error: [ VUID-RuntimeSpirv-Workgroup-06530 ] | MessageID = 0xac32b098
vkCreateComputePipelines(): pCreateInfos[0].stage SPIR-V uses 35840 bytes of shared memory, which is more than maxComputeSharedMemorySize (32768).
The Vulkan spec states: The sum of size in bytes for variables and padding in the Workgroup Storage Class in the GLCompute Execution Model must be less than or equal to maxComputeSharedMemorySize (https://docs.vulkan.org/spec/latest/appendices/spirvenv.html#VUID-RuntimeSpirv-Workgroup-06530)
Objects: 1
[0] VkShaderModule 0x310000000031
TypeToDescriptorTypeSet: Starting with type id 798 opcode 32, dtid 0, trid 798
TypeToDescriptorTypeSet: Starting with type id 168 opcode 32, dtid 0, trid 168
TypeToDescriptorTypeSet: Starting with type id 53 opcode 32, dtid 0, trid 53
MESA: error: Compute shader ((null)) which has workgroup barrier cannot be used because it's impossible to have enough (2) concurrent waves (0 due to shared, 64 due to branchstack).
[Switching to Thread 0xffffecfeb4a0 (LWP 22379)]
Thread 11 "llama-bench" hit Breakpoint 1, ir3_get_reg_independent_max_waves (v=v@entry=0xffffd83a5260,
double_threadsize=double_threadsize@entry=false) at ../src/freedreno/ir3/ir3.c:300
300 exit(1);
(gdb) bt
#0 ir3_get_reg_independent_max_waves (v=v@entry=0xffffd83a5260, double_threadsize=double_threadsize@entry=false)
at ../src/freedreno/ir3/ir3.c:300
#1 0x0000fffff11da2dc in calc_target_full_pressure (v=0xffffd83a5260, pressure=<optimized out>)
at ../src/freedreno/ir3/ir3_ra.c:2537
#2 ir3_ra (v=v@entry=0xffffd83a5260) at ../src/freedreno/ir3/ir3_ra.c:2875
#3 0x0000fffff1189ed0 in ir3_compile_shader_nir (compiler=<optimized out>, shader=shader@entry=0xffffd80c59e0,
so=so@entry=0xffffd83a5260) at ../src/freedreno/ir3/ir3_compiler_nir.c:6111
#4 0x0000fffff11e5c24 in compile_variant (shader=shader@entry=0xffffd80c59e0, v=v@entry=0xffffd83a5260)
at ../src/freedreno/ir3/ir3_shader.c:453
#5 0x0000fffff11e6078 in create_variant (shader=0xffffd80c59e0, key=0xffffecfea360, write_disasm=<optimized out>,
mem_ctx=0x0) at ../src/freedreno/ir3/ir3_shader.c:629
#6 0x0000fffff10c149c in tu_shader_create (dev=dev@entry=0xaaaab2b6b3c0, shader_out=shader_out@entry=0xffffecfea300,
nir=<optimized out>, key=key@entry=0xffffecfea320, info=info@entry=0xffffecfea2f8, ir3_key=<optimized out>,
key_data=key_data@entry=0xffffecfea340, key_size=key_size@entry=32, layout=<optimized out>,
layout@entry=0xffffd8003fa0, executable_info=<optimized out>, executable_info@entry=false)
at ../src/freedreno/vulkan/tu_shader.cc:3152
#7 0x0000fffff1078ccc in tu_compute_pipeline_create<(chip)7> (device=0xaaaab2b6b3c0, pipelineCache=0x0,
pCreateInfo=0xffffd80b35f8, flags=64, pAllocator=0x0, pPipeline=0xffffecfea900)
at ../src/freedreno/vulkan/tu_pipeline.cc:4976
#8 tu_CreateComputePipelines<(chip)7> (device=0xaaaab2b6b3c0, pipelineCache=<optimized out>, count=<optimized out>,
pCreateInfos=<optimized out>, pAllocator=<optimized out>, pPipelines=<optimized out>)
at ../src/freedreno/vulkan/tu_pipeline.cc:5058
#9 0x0000ffffef66da2c in vvl::dispatch::Device::CreateComputePipelines (this=0xaaaab2a73e80, device=<optimized out>,
pipelineCache=<optimized out>, createInfoCount=1, pCreateInfos=0xffffecfea930, pAllocator=0x0,
pPipelines=<optimized out>)
at /home/anholt/src/Vulkan-ValidationLayers/layers/chassis/dispatch_object_manual.cpp:2310
#10 0x0000ffffef661070 in vulkan_layer_chassis::CreateComputePipelines (device=0xaaaab2b6b3c0, pipelineCache=0x0,
createInfoCount=1, pCreateInfos=0xffffecfea930, pAllocator=0x0, pPipelines=0xffffecfea900)
at /home/anholt/src/Vulkan-ValidationLayers/layers/chassis/chassis_manual.cpp:617
#11 0x0000fffff32697a0 in vk::Device::createComputePipeline<vk::detail::DispatchLoaderDynamic, true> (
this=0xaaaab2a3b758, pipelineCache=..., createInfo=..., allocator=..., d=...)
at /usr/include/vulkan/vulkan_funcs.hpp:3830
#12 ggml_vk_create_pipeline_func (device=std::shared_ptr<vk_device_struct> (use count 10, weak count 2) = {...},
pipeline=std::shared_ptr<vk_pipeline_struct> (use count 2, weak count 0) = {...}, spv_size=<optimized out>,
spv_data=<optimized out>, entrypoint="main", parameter_count=<optimized out>, wg_denoms=...,
specialization_constants=..., disable_robustness=<optimized out>, require_full_subgroups=<optimized out>,
required_subgroup_size=<optimized out>) at /home/anholt/src/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp:2284
[...]
(gdb) frame 12
list
#12 ggml_vk_create_pipeline_func (device=std::shared_ptr<vk_device_struct> (use count 10, weak count 2) = {...},
pipeline=std::shared_ptr<vk_pipeline_struct> (use count 2, weak count 0) = {...}, spv_size=<optimized out>,
spv_data=<optimized out>, entrypoint="main", parameter_count=<optimized out>, wg_denoms=...,
specialization_constants=..., disable_robustness=<optimized out>, require_full_subgroups=<optimized out>,
required_subgroup_size=<optimized out>) at /home/anholt/src/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp:2284
2284 pipeline->pipeline = device->device.createComputePipeline(VK_NULL_HANDLE, compute_pipeline_create_info).value;
(gdb) p pipeline->name
$1 = "matmul_q8_0_q8_1_l"
Name and Version
e77056f ("CUDA: use fastdiv for batch index split in get_rows (#22650)")
Operating systems
No response
Which llama.cpp modules do you know to be affected?
No response
Command line
`/home/anholt/src/llama.cpp/build-aarch64/bin/llama-bench -ngl 99 -m /home/anholt/.cache/llama.cpp/ggml-org_Nomic-Embed-Text-V2-GGUF_nomic-embed-text-v2-moe-q8_0.gguf`Problem description & steps to reproduce
On turnip, once I set
integerDotProduct4x8BitPackedSignedAcceleratedto enable int dot support, I end up with Vulkan validation failures that end up with the driver failing compilingmatmul_q8_0_q8_1_l. gdb excerpt included.First Bad Commit
No response
Relevant log output
Logs