Name and Version
ggml_cuda_init: found 2 CUDA devices (Total VRAM: 49151 MiB):
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24575 MiB
Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24575 MiB
version: 8667 (c08d28d)
built with MSVC 19.44.35209.0 for x64
Operating systems
Windows
GGML backends
CUDA
Hardware
Intel 10900k
2x RTX 3090
Models
Gemma 4 31b-it q8 from bartowski
My SHA256 is good: 2BADB6EED44009D6790F7204FE9D37FDB7BE278BE206D6C75B5672E416D41716
Problem description & steps to reproduce
Other models load and run properly without issue, but Gemma 4 does not.
Gemma 4 loads and runs using the prebuilt binaries, although I receive the same warning str: cannot properly format tensor name output with suffix=weight bid=-1 xid=-1 but after that it's fine.
My build command:
cmake -B build -DGGML_CUDA=ON -DGGML_SCHED_MAX_COPIES="1"
cmake --build build --config Release -j 19
My launch command:
@echo off
set CUDA_VISIBLE_DEVICES=0,1
llama-server.exe ^
-m "T:\models\google_gemma-4-31B-it-Q8_0.gguf" ^
-ts 23,20.5 ^
-c 65536 ^
--ubatch-size 4096 ^
--checkpoint-every-n-tokens 2048 ^
--no-mmap
First Bad Commit
No response
Relevant log output
Logs
load_tensors: loading model tensors, this can take a while... (mmap = false, direct_io = false)
str: cannot properly format tensor name output with suffix=weight bid=-1 xid=-1
�[0mllama_model_load: error loading model: check_tensor_dims: tensor 'blk.48.attn_q.weight' has wrong shape; expected 5376, 16384, got 5376, 8192, 1, 1
�[0mllama_model_load_from_file_impl: failed to load model
�[0mcommon_init_from_params: failed to load model 'T:\models\google_gemma-4-31B-it-Q8_0.gguf'
�[0msrv load_model: failed to load model, 'T:\models\google_gemma-4-31B-it-Q8_0.gguf'
�[0msrv operator (): operator (): cleaning up before exit...
main: exiting due to model loading error
Logs with --verbose:
!logs.txt
Name and Version
ggml_cuda_init: found 2 CUDA devices (Total VRAM: 49151 MiB):
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24575 MiB
Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24575 MiB
version: 8667 (c08d28d)
built with MSVC 19.44.35209.0 for x64
Operating systems
Windows
GGML backends
CUDA
Hardware
Intel 10900k
2x RTX 3090
Models
Gemma 4 31b-it q8 from bartowski
My SHA256 is good: 2BADB6EED44009D6790F7204FE9D37FDB7BE278BE206D6C75B5672E416D41716
Problem description & steps to reproduce
Other models load and run properly without issue, but Gemma 4 does not.
Gemma 4 loads and runs using the prebuilt binaries, although I receive the same warning
str: cannot properly format tensor name output with suffix=weight bid=-1 xid=-1but after that it's fine.My build command:
My launch command:
First Bad Commit
No response
Relevant log output
Logs
!logs.txt