Skip to content

Misc. bug: -sm tensor + MTP + ngram-mod = crash #23929

@AbdulrahmanHashem

Description

@AbdulrahmanHashem

Name and Version

version 9438
compiled for CUDA 13.2

Tested on

Linux Cachyos
5060ti 16gb + 2060 super llama cpp built with CUDA 13.2

Windows 11
2x 5060ti 16gb llama cpp built with CUDA 13.2

cmake -B build -G Ninja \
      -DGGML_CUDA=ON \
      -DCMAKE_CUDA_ARCHITECTURES="native" \
      -DGGML_CUDA_FA_ALL_QUANTS=ON \
      -DGGML_CUDA_F16=ON \
      -DGGML_NATIVE=ON \
      -DCMAKE_INTERPROCEDURAL_OPTIMIZATION=ON \
      -DCMAKE_BUILD_TYPE=Release \
      -DLLAMA_BUILD_TESTS=OFF \
      -DLLAMA_BUILD_EXAMPLES=ON \
      -DLLAMA_BUILD_SERVER=ON

  cmake --build build --config Release -j$(nproc)

Operating systems

Linux Cachyos
Windows 11

Which llama.cpp modules do you know to be affected?

llama-server

Command line

./build/bin/llama-server \
      --verbosity 4 \
      -m /home/abdulrahman/Personal/Programs/llama/models/Qwen/Qwen3.6-27B-Q4_K_S.gguf \
      --chat_template_kwargs '{"preserve_thinking": "True"}' \
      --jinja \
      --host 0.0.0.0 --port 8080 \
      --spec-type draft-mtp,ngram-mod --spec-draft-n-max 2 --spec-ngram-mod-n-match 24 --spec-ngram-mod-n-min 2 --spec-ngram-mod-n-max 48 \
      -c 64000 \
      -sm tensor --tensor-split 66,24 -ub 48

Problem description & steps to reproduce

it simply crashes when ngram starts generating tokens and the only error is terminated by signal SIGSEGV (Address boundary error) nothing else it just lages for a bit and stops generating then crach and that error

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions