Eval bug: Gemma (4)'s final logit softcapping might not be working correctly

### Name and Version

llama-cli --version
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 24124 MiB):
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24124 MiB
version: 8647 (b069b10ab)
built with GNU 15.1.0 for Linux x86_64


### Operating systems

Linux

### GGML backends

CUDA

### Hardware

Intel i7 12th gen + RTX3090

### Models

Gemma 4 31B IT

### Problem description & steps to reproduce

From testing, it appears that Gemma 4's "Final Logit Softcapping" might not be taken into consideration during inference. This is probably resulting into exceedingly confident predictions, insensitivity to temperature.

How to observe how final logit softcapping works:
1) Download the HF version of Gemma 4 from Google
2) Edit `config.json`, change `final_logit_softcapping` from the default value of 30.0 to a low value like 20.0 or 15.0
3) Perform inference via Transformers
4) Observe how the model quickly becomes incoherent, as if temperature was very high.

How to observe that the same setting doesn't appear to have any effect in llama.cpp
1) Try to override it in llama-server CLI settings, e.g. `--override-kv gemma4.final_logit_softcapping=float:15.0` ⇒ no change in outputs
2) Try to override it by editing the corresponding key in the GGUF file, e.g. with [this code](https://github.com/ggml-org/llama.cpp/blob/master/gguf-py/gguf/scripts/gguf_new_metadata.py) to a low value ⇒ no change
3) Try to edit the key in the original HF config.json file, then convert to GGUF ⇒ no change

**Conclusion**: llama.cpp doesn't properly implement Gemma's final logit softcapping.
This issue might affect  Gemma 2 and 3 as well (which also use logit softcapping), although I haven't tested this in depth.

### First Bad Commit

_No response_

### Relevant log output

No relevant outputs to report here. With a very low `final_logit_softcapping` value, generally speaking, model outputs should be completely incoherent.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: Gemma (4)'s final logit softcapping might not be working correctly #21388

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Eval bug: Gemma (4)'s final logit softcapping might not be working correctly #21388

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions