Max cache length issue with Gemma 3

### System Info

- `transformers` version: 4.54.0
- Platform: Linux-6.11.0-1011-nvidia-x86_64-with-glibc2.39
- Python version: 3.11.13
- Huggingface_hub version: 0.34.1
- Safetensors version: 0.5.3
- Accelerate version: 1.9.0
- Accelerate config:    not found
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.7.1+cu128 (CUDA)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: no
- Using GPU in script?: yes
- GPU type: NVIDIA RTX PRO 6000 Blackwell Workstation Edition

### Who can help?

@gante 

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [x] My own task or dataset (give details below)

### Reproduction

Trying to generate with `AutoModelForCausalLM` with `google/gemma3-1b-it` is running into the following error when using `eager` attention as specified to do from logging messages (rather than using `sdpa`).

```
ValueError: Max cache length is not consistent across layers: [512, 512, 512, 512, 512, 741, 512, 512, 512, 512, 512, 741, 512, 512, 512, 512, 512, 741, 512, 512, 512, 512, 512, 741, 512, 512]
```

The offending code seems to be here in `transformers/cache_utils.py`
```
    @property
    def max_cache_len(self) -> int:
        """Return the maximum cache length of the cache"""
        values = [layer.max_cache_len for layer in self.layers]
        if len(set(values)) > 1:
            raise ValueError(f"Max cache length is not consistent across layers: {values}")
        return values[0]
```

This check seems to be inconsistent with Gemma3's layer structure, where 5 layers use sliding attention with size `512` and 6th layer uses full casual attention.

The gitblame shows this was changed recently from the commit: https://github.com/huggingface/transformers/commit/c338fd43b0be2c7f5d73e693fa6fb1b5e7a0bdc2

This worked fine in the previous transformers version I was running, but I needed to update my version recently and this error started occurring. I'm happy to post a PR for a fix if this is determined to be a bug.

### Expected behavior

That the model generates text as expected.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Max cache length issue with Gemma 3 #39711

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Max cache length issue with Gemma 3 #39711

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions