Skip to content

Max cache length issue with Gemma 3 #39711

@mitchelldehaven

Description

@mitchelldehaven

System Info

  • transformers version: 4.54.0
  • Platform: Linux-6.11.0-1011-nvidia-x86_64-with-glibc2.39
  • Python version: 3.11.13
  • Huggingface_hub version: 0.34.1
  • Safetensors version: 0.5.3
  • Accelerate version: 1.9.0
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (accelerator?): 2.7.1+cu128 (CUDA)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: no
  • Using GPU in script?: yes
  • GPU type: NVIDIA RTX PRO 6000 Blackwell Workstation Edition

Who can help?

@gante

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Trying to generate with AutoModelForCausalLM with google/gemma3-1b-it is running into the following error when using eager attention as specified to do from logging messages (rather than using sdpa).

ValueError: Max cache length is not consistent across layers: [512, 512, 512, 512, 512, 741, 512, 512, 512, 512, 512, 741, 512, 512, 512, 512, 512, 741, 512, 512, 512, 512, 512, 741, 512, 512]

The offending code seems to be here in transformers/cache_utils.py

    @property
    def max_cache_len(self) -> int:
        """Return the maximum cache length of the cache"""
        values = [layer.max_cache_len for layer in self.layers]
        if len(set(values)) > 1:
            raise ValueError(f"Max cache length is not consistent across layers: {values}")
        return values[0]

This check seems to be inconsistent with Gemma3's layer structure, where 5 layers use sliding attention with size 512 and 6th layer uses full casual attention.

The gitblame shows this was changed recently from the commit: c338fd4

This worked fine in the previous transformers version I was running, but I needed to update my version recently and this error started occurring. I'm happy to post a PR for a fix if this is determined to be a bug.

Expected behavior

That the model generates text as expected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions