Skip to content

Eval bug: -spec-draft-type-k, --spec-draft-type-v cause crash with Spec. Decoding / MTP #24040

@d-shehu

Description

@d-shehu

Name and Version

llama-cli
b9222
Ubuntu 24.04

Operating systems

Linux

GGML backends

Vulkan

Hardware

5950X, (dual) R9700 AI Pro

Models

Gemma 4 31B - Unsloth - UD Q8 K XL

Problem description & steps to reproduce

When adding these arguments, llama-cli segment faults while loading, almost immediately. Removing just these 2 allows the model to run and I can run inference without issues so far.

--spec-draft-type-k q8_0 
--spec-draft-type-v q8_0

See crash output in logs section. I also tested these 2 with Granite and it ran without problems. It seems to be specific to Gemma 4 (also Q8)?

Params & Args:

-- model ./models/models--unsloth--gemma-4-31B-it-GGUF/UD-Q8_K_XL/gemma-4-31B-it-UD-Q8_K_XL.gguf                                                                                                                                                                                                                                    
--temp 0.0                 
--top-p 1.0                 
--top-k 1.0                 
--repeat-penalty 1.2                                                                                                                                                               

-ngl all                 
--jinja                 
-sm layer                 
--fit off                 
-fa auto                 
-ub 512 -b 2048                             
--cache-type-k q8_0 
--cache-type-v q8_0
                                                 
--model-draft ./models/models--unsloth--gemma-4-E2B-it-GGUF/UD-Q8_K_XL/gemma-4-E2B-it-UD-Q8_K_XL.gguf                     
--spec-draft-ngl all                     
--spec-draft-n-max 4 
--spec-draft-n-min 1                     
--device-draft Vulkan0,Vulkan1                                                                                                                                                                                                                                                                         
 --seed 3457                         
-t 16                                                  
-dev Vulkan0,Vulkan1                         
--tensor-split 30,30                         
--no-mmap                         
--no-warmup   

First Bad Commit

No response

Relevant log output

Loading model... |llama.cpp/ggml/src/ggml-backend.cpp:1367: GGML_ASSERT(n_inputs < GGML_SCHED_MAX_SPLIT_INPUTS) failed
llama.cpp/build/b9222/vulkan/bin/libggml-base.so.0(+0x1addb)[0x7302d7279ddb]
llama.cpp/build/b9222/vulkan/bin/libggml-base.so.0(ggml_print_backtrace+0x21c)[0x7302d727a25c]
llama.cpp/build/b9222/vulkan/bin/libggml-base.so.0(ggml_abort+0x15b)[0x7302d727a43b]
llama.cpp/build/b9222/vulkan/bin/libggml-base.so.0(ggml_backend_sched_split_graph+0x224f)[0x7302d72960df]
llama.cpp/build/b9222/vulkan/bin/libllama.so.0(_ZN13llama_context13graph_reserveEjjjPK22llama_memory_context_ibPm+0x68b)[0x7302d68db08b]
llama.cpp/build/b9222/vulkan/bin/libllama.so.0(_ZN13llama_context13sched_reserveEv+0xa78)[0x7302d68dbc28]
llama.cpp/build/b9222/vulkan/bin/libllama.so.0(_ZN13llama_contextC1ERK11llama_model20llama_context_params+0xc39)[0x7302d68e0089]
llama.cpp/build/b9222/vulkan/bin/libllama.so.0(llama_init_from_model+0x14f)[0x7302d68e0c2f]
./llama-cli(+0x11525d)[0x5f714c0fd25d]
./llama-cli(+0x4d3bb)[0x5f714c0353bb]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7302d639a1ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7302d639a28b]
./llama-cli(+0x4f0c5)[0x5f714c0370c5]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions