Name and Version
version 9438
compiled for CUDA 13.2
Tested on
Linux Cachyos
5060ti 16gb + 2060 super llama cpp built with CUDA 13.2
Windows 11
2x 5060ti 16gb llama cpp built with CUDA 13.2
cmake -B build -G Ninja \
-DGGML_CUDA=ON \
-DCMAKE_CUDA_ARCHITECTURES="native" \
-DGGML_CUDA_FA_ALL_QUANTS=ON \
-DGGML_CUDA_F16=ON \
-DGGML_NATIVE=ON \
-DCMAKE_INTERPROCEDURAL_OPTIMIZATION=ON \
-DCMAKE_BUILD_TYPE=Release \
-DLLAMA_BUILD_TESTS=OFF \
-DLLAMA_BUILD_EXAMPLES=ON \
-DLLAMA_BUILD_SERVER=ON
cmake --build build --config Release -j$(nproc)
Operating systems
Linux Cachyos
Windows 11
Which llama.cpp modules do you know to be affected?
llama-server
Command line
./build/bin/llama-server \
--verbosity 4 \
-m /home/abdulrahman/Personal/Programs/llama/models/Qwen/Qwen3.6-27B-Q4_K_S.gguf \
--chat_template_kwargs '{"preserve_thinking": "True"}' \
--jinja \
--host 0.0.0.0 --port 8080 \
--spec-type draft-mtp,ngram-mod --spec-draft-n-max 2 --spec-ngram-mod-n-match 24 --spec-ngram-mod-n-min 2 --spec-ngram-mod-n-max 48 \
-c 64000 \
-sm tensor --tensor-split 66,24 -ub 48
Problem description & steps to reproduce
it simply crashes when ngram starts generating tokens and the only error is terminated by signal SIGSEGV (Address boundary error) nothing else it just lages for a bit and stops generating then crach and that error
Name and Version
version 9438
compiled for CUDA 13.2
Tested on
Linux Cachyos
5060ti 16gb + 2060 super llama cpp built with CUDA 13.2
Windows 11
2x 5060ti 16gb llama cpp built with CUDA 13.2
Operating systems
Linux Cachyos
Windows 11
Which llama.cpp modules do you know to be affected?
llama-server
Command line
./build/bin/llama-server \ --verbosity 4 \ -m /home/abdulrahman/Personal/Programs/llama/models/Qwen/Qwen3.6-27B-Q4_K_S.gguf \ --chat_template_kwargs '{"preserve_thinking": "True"}' \ --jinja \ --host 0.0.0.0 --port 8080 \ --spec-type draft-mtp,ngram-mod --spec-draft-n-max 2 --spec-ngram-mod-n-match 24 --spec-ngram-mod-n-min 2 --spec-ngram-mod-n-max 48 \ -c 64000 \ -sm tensor --tensor-split 66,24 -ub 48Problem description & steps to reproduce
it simply crashes when ngram starts generating tokens and the only error is
terminated by signal SIGSEGV (Address boundary error)nothing else it just lages for a bit and stops generating then crach and that error