Name and Version
llama-cli
b9222
Ubuntu 24.04
Operating systems
Linux
GGML backends
Vulkan
Hardware
5950X, (dual) R9700 AI Pro
Models
Gemma 4 31B - Unsloth - UD Q8 K XL
Problem description & steps to reproduce
When adding these arguments, llama-cli segment faults while loading, almost immediately. Removing just these 2 allows the model to run and I can run inference without issues so far.
--spec-draft-type-k q8_0
--spec-draft-type-v q8_0
See crash output in logs section. I also tested these 2 with Granite and it ran without problems. It seems to be specific to Gemma 4 (also Q8)?
Params & Args:
-- model ./models/models--unsloth--gemma-4-31B-it-GGUF/UD-Q8_K_XL/gemma-4-31B-it-UD-Q8_K_XL.gguf
--temp 0.0
--top-p 1.0
--top-k 1.0
--repeat-penalty 1.2
-ngl all
--jinja
-sm layer
--fit off
-fa auto
-ub 512 -b 2048
--cache-type-k q8_0
--cache-type-v q8_0
--model-draft ./models/models--unsloth--gemma-4-E2B-it-GGUF/UD-Q8_K_XL/gemma-4-E2B-it-UD-Q8_K_XL.gguf
--spec-draft-ngl all
--spec-draft-n-max 4
--spec-draft-n-min 1
--device-draft Vulkan0,Vulkan1
--seed 3457
-t 16
-dev Vulkan0,Vulkan1
--tensor-split 30,30
--no-mmap
--no-warmup
First Bad Commit
No response
Relevant log output
Loading model... |llama.cpp/ggml/src/ggml-backend.cpp:1367: GGML_ASSERT(n_inputs < GGML_SCHED_MAX_SPLIT_INPUTS) failed
llama.cpp/build/b9222/vulkan/bin/libggml-base.so.0(+0x1addb)[0x7302d7279ddb]
llama.cpp/build/b9222/vulkan/bin/libggml-base.so.0(ggml_print_backtrace+0x21c)[0x7302d727a25c]
llama.cpp/build/b9222/vulkan/bin/libggml-base.so.0(ggml_abort+0x15b)[0x7302d727a43b]
llama.cpp/build/b9222/vulkan/bin/libggml-base.so.0(ggml_backend_sched_split_graph+0x224f)[0x7302d72960df]
llama.cpp/build/b9222/vulkan/bin/libllama.so.0(_ZN13llama_context13graph_reserveEjjjPK22llama_memory_context_ibPm+0x68b)[0x7302d68db08b]
llama.cpp/build/b9222/vulkan/bin/libllama.so.0(_ZN13llama_context13sched_reserveEv+0xa78)[0x7302d68dbc28]
llama.cpp/build/b9222/vulkan/bin/libllama.so.0(_ZN13llama_contextC1ERK11llama_model20llama_context_params+0xc39)[0x7302d68e0089]
llama.cpp/build/b9222/vulkan/bin/libllama.so.0(llama_init_from_model+0x14f)[0x7302d68e0c2f]
./llama-cli(+0x11525d)[0x5f714c0fd25d]
./llama-cli(+0x4d3bb)[0x5f714c0353bb]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7302d639a1ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7302d639a28b]
./llama-cli(+0x4f0c5)[0x5f714c0370c5]
Name and Version
llama-cli
b9222
Ubuntu 24.04
Operating systems
Linux
GGML backends
Vulkan
Hardware
5950X, (dual) R9700 AI Pro
Models
Gemma 4 31B - Unsloth - UD Q8 K XL
Problem description & steps to reproduce
When adding these arguments, llama-cli segment faults while loading, almost immediately. Removing just these 2 allows the model to run and I can run inference without issues so far.
See crash output in logs section. I also tested these 2 with Granite and it ran without problems. It seems to be specific to Gemma 4 (also Q8)?
Params & Args:
First Bad Commit
No response
Relevant log output