Name and Version
version: 9527 (9c955c48b)
built with Clang 21.1.7 for Linux x86_64
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
./build/bin/llama-server --jinja --model ~/llama/Qwen3.6-27B-MTP-GGUF/Qwen3.6-27B-UD-Q4_K_XL.gguf --threads -1 -fa on -ctv q5_1 -ctk q8_0 --host 0.0.0.0 --temp 1.0 --top-k 20 --top-p 0.95 --min-p 0 -fitt 320 --repeat-penalty 1.0 --presence-penalty 0.0 --reasoning on -np 1 -n 32768 --mmproj ~/llama/Qwen3.6-27B-MTP-GGUF/mmproj-F32.gguf --no-mmproj-offload --image-min-tokens 1024 --spec-type draft-mtp --spec-draft-n-max 3 --spec-draft-p-min 0.6 -lv 4 --cache-ram 32768 -c 131072
Problem description & steps to reproduce
One layer no longer seems to be offloaded according to the logs.
First Bad Commit
Very likely to be caused by #24060
Relevant log output
0.02.063.505 I load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false)
0.03.232.046 I load_tensors: offloading output layer to GPU
0.03.232.049 I load_tensors: offloading 64 repeating layers to GPU
0.03.232.050 I load_tensors: offloaded 65/66 layers to GPU
0.03.232.053 I load_tensors: CPU_Mapped model buffer size = 942.97 MiB
0.03.232.053 I load_tensors: CUDA0 model buffer size = 16126.00 MiB
Name and Version
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
Problem description & steps to reproduce
One layer no longer seems to be offloaded according to the logs.
First Bad Commit
Very likely to be caused by #24060
Relevant log output