Skip to content

Eval bug: Gemma (4)'s final logit softcapping might not be working correctly #21388

@BugReporterZ

Description

@BugReporterZ

Name and Version

llama-cli --version
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 24124 MiB):
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24124 MiB
version: 8647 (b069b10)
built with GNU 15.1.0 for Linux x86_64

Operating systems

Linux

GGML backends

CUDA

Hardware

Intel i7 12th gen + RTX3090

Models

Gemma 4 31B IT

Problem description & steps to reproduce

From testing, it appears that Gemma 4's "Final Logit Softcapping" might not be taken into consideration during inference. This is probably resulting into exceedingly confident predictions, insensitivity to temperature.

How to observe how final logit softcapping works:

  1. Download the HF version of Gemma 4 from Google
  2. Edit config.json, change final_logit_softcapping from the default value of 30.0 to a low value like 20.0 or 15.0
  3. Perform inference via Transformers
  4. Observe how the model quickly becomes incoherent, as if temperature was very high.

How to observe that the same setting doesn't appear to have any effect in llama.cpp

  1. Try to override it in llama-server CLI settings, e.g. --override-kv gemma4.final_logit_softcapping=float:15.0 ⇒ no change in outputs
  2. Try to override it by editing the corresponding key in the GGUF file, e.g. with this code to a low value ⇒ no change
  3. Try to edit the key in the original HF config.json file, then convert to GGUF ⇒ no change

Conclusion: llama.cpp doesn't properly implement Gemma's final logit softcapping.
This issue might affect Gemma 2 and 3 as well (which also use logit softcapping), although I haven't tested this in depth.

First Bad Commit

No response

Relevant log output

No relevant outputs to report here. With a very low final_logit_softcapping value, generally speaking, model outputs should be completely incoherent.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions