Skip to content

Misc. bug: Off-by-one-errror causes one layer to not go on the GPU #24183

@gcp

Description

@gcp

Name and Version

version: 9527 (9c955c48b)
built with Clang 21.1.7 for Linux x86_64

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Command line

./build/bin/llama-server --jinja --model ~/llama/Qwen3.6-27B-MTP-GGUF/Qwen3.6-27B-UD-Q4_K_XL.gguf --threads -1 -fa on -ctv q5_1 -ctk q8_0 --host 0.0.0.0 --temp 1.0 --top-k 20 --top-p 0.95 --min-p 0 -fitt 320 --repeat-penalty 1.0 --presence-penalty 0.0 --reasoning on -np 1 -n 32768 --mmproj ~/llama/Qwen3.6-27B-MTP-GGUF/mmproj-F32.gguf --no-mmproj-offload --image-min-tokens 1024 --spec-type draft-mtp --spec-draft-n-max 3 --spec-draft-p-min 0.6 -lv 4 --cache-ram 32768 -c 131072

Problem description & steps to reproduce

One layer no longer seems to be offloaded according to the logs.

First Bad Commit

Very likely to be caused by #24060

Relevant log output

0.02.063.505 I load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false)
0.03.232.046 I load_tensors: offloading output layer to GPU
0.03.232.049 I load_tensors: offloading 64 repeating layers to GPU                                                        
0.03.232.050 I load_tensors: offloaded 65/66 layers to GPU
0.03.232.053 I load_tensors:   CPU_Mapped model buffer size =   942.97 MiB                                                
0.03.232.053 I load_tensors:        CUDA0 model buffer size = 16126.00 MiB

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions