Eval bug: Gemma 4 31b fails to load with wrong tensor shape for blk.48.attn_q.weight, but works with pre-compiled binaries. 

### Name and Version

ggml_cuda_init: found 2 CUDA devices (Total VRAM: 49151 MiB):
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24575 MiB
  Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24575 MiB
version: 8667 (c08d28d08)
built with MSVC 19.44.35209.0 for x64

### Operating systems

Windows

### GGML backends

CUDA

### Hardware

Intel 10900k
2x RTX 3090

### Models

[Gemma 4 31b-it q8 from bartowski](https://huggingface.co/bartowski/google_gemma-4-31B-it-GGUF/blob/main/google_gemma-4-31B-it-Q8_0.gguf)
My SHA256 is good: 2BADB6EED44009D6790F7204FE9D37FDB7BE278BE206D6C75B5672E416D41716

### Problem description & steps to reproduce

Other models load and run properly without issue, but Gemma 4 does not.
Gemma 4 loads and runs using the prebuilt binaries, although I receive the same warning `str: cannot properly format tensor name output with suffix=weight bid=-1 xid=-1` but after that it's fine.

My build command:
```
cmake -B build -DGGML_CUDA=ON -DGGML_SCHED_MAX_COPIES="1"
cmake --build build --config Release -j 19
```
My launch command:
```
@echo off
set CUDA_VISIBLE_DEVICES=0,1
llama-server.exe ^
-m "T:\models\google_gemma-4-31B-it-Q8_0.gguf" ^
-ts 23,20.5 ^
-c 65536 ^
--ubatch-size 4096 ^
--checkpoint-every-n-tokens 2048 ^
--no-mmap
```

### First Bad Commit

_No response_

### Relevant log output

<details>
<summary>Logs</summary>


```console
load_tensors: loading model tensors, this can take a while... (mmap = false, direct_io = false)
str: cannot properly format tensor name output with suffix=weight bid=-1 xid=-1
[0mllama_model_load: error loading model: check_tensor_dims: tensor 'blk.48.attn_q.weight' has wrong shape; expected   5376,  16384, got   5376,   8192,      1,      1
[0mllama_model_load_from_file_impl: failed to load model
[0mcommon_init_from_params: failed to load model 'T:\models\google_gemma-4-31B-it-Q8_0.gguf'
[0msrv    load_model: failed to load model, 'T:\models\google_gemma-4-31B-it-Q8_0.gguf'
[0msrv   operator (): operator (): cleaning up before exit...
main: exiting due to model loading error
```
</details>
Logs with --verbose:


[!logs.txt](https://github.com/user-attachments/files/26485968/logs.txt)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: Gemma 4 31b fails to load with wrong tensor shape for blk.48.attn_q.weight, but works with pre-compiled binaries. #21457

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Eval bug: Gemma 4 31b fails to load with wrong tensor shape for blk.48.attn_q.weight, but works with pre-compiled binaries. #21457

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions