Problem
Gemma 4 models (released April 2, 2026) use Gemma4ClippableLinear in their vision and audio towers. This module wraps a standard nn.Linear with optional input/output clamping but is not recognized by PEFT's LoRA dispatch, causing a ValueError when attempting fine-tuning:
ValueError: Target module Gemma4ClippableLinear(
(linear): Linear(in_features=768, out_features=768, bias=False)
) is not supported.
This affects all Gemma 4 variants (E2B, E4B, 26B-A4B, 31B) when targeting modules like q_proj that appear in both the language model (standard nn.Linear) and vision/audio towers (Gemma4ClippableLinear).
Root Cause
Gemma4ClippableLinear is an nn.Module (not an nn.Linear subclass) that wraps a standard nn.Linear:
class Gemma4ClippableLinear(nn.Module):
def __init__(self, config, in_features, out_features):
super().__init__()
self.use_clipped_linears = config.use_clipped_linears
self.linear = nn.Linear(in_features, out_features, bias=False)
if self.use_clipped_linears:
self.register_buffer("input_min", torch.tensor(-float("inf")))
self.register_buffer("input_max", torch.tensor(float("inf")))
self.register_buffer("output_min", torch.tensor(-float("inf")))
self.register_buffer("output_max", torch.tensor(float("inf")))
def forward(self, hidden_states):
if self.use_clipped_linears:
hidden_states = torch.clamp(hidden_states, self.input_min, self.input_max)
hidden_states = self.linear(hidden_states)
if self.use_clipped_linears:
hidden_states = torch.clamp(hidden_states, self.output_min, self.output_max)
return hidden_states
Because it doesn't subclass nn.Linear, dispatch_default doesn't match it, and _get_in_out_features can't extract dimensions from it.
Workaround
Users can work around this today with exclude_modules:
lora_config = LoraConfig(
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
exclude_modules=["vision_tower", "audio_tower"],
)
This avoids the ClippableLinear modules entirely but prevents fine-tuning vision/audio tower projections.
For users who need to target these modules, a monkey-patch works for basic training (but breaks merge/unmerge):
from peft.tuners.lora import model as lora_model
from transformers.models.gemma4.modeling_gemma4 import Gemma4ClippableLinear
_original = lora_model.LoraModel._create_new_module
@classmethod
def _patch(cls, lora_config, adapter_name, target, **kwargs):
if isinstance(target, Gemma4ClippableLinear):
return _original(lora_config, adapter_name, target.linear, **kwargs)
return _original(lora_config, adapter_name, target, **kwargs)
lora_model.LoraModel._create_new_module = _patch
Analysis of a Proper Fix
We attempted several approaches and identified the code paths that need updating for full support:
1. dispatch_default in layer.py
Add an elif for wrapper modules with an inner .linear. The wrapper should be passed as base_layer (not the inner linear) to preserve the wrapper's forward behavior.
2. _get_in_out_features in tuners_utils.py
Add support for extracting in_features/out_features from the inner .linear.
3. merge() / unmerge() / merge_and_unload() in layer.py
These access self.get_base_layer().weight, which fails for wrappers that don't expose .weight directly (it's at .linear.weight). Either the wrapper needs a .weight property or get_base_layer() needs to drill through.
4. Weight-based initializers
olora, pissa, corda, loftq, orthogonal, lora_ga all access self.get_base_layer().weight. Same issue as merge/unmerge.
Environment
- transformers 5.5.0
- peft 0.18.2.dev0 (latest main)
- Gemma 4 models:
google/gemma-4-E2B-it, google/gemma-4-E4B-it, google/gemma-4-26B-A4B-it, google/gemma-4-31B-it
Tested With
- DPO training on Gemma 4 E2B-it with TRL 1.0.0 (successful with monkey-patch)
- SFT training (not tested but same issue expected)
- GRPO training (not tested but same issue expected)
Problem
Gemma 4 models (released April 2, 2026) use
Gemma4ClippableLinearin their vision and audio towers. This module wraps a standardnn.Linearwith optional input/output clamping but is not recognized by PEFT's LoRA dispatch, causing aValueErrorwhen attempting fine-tuning:This affects all Gemma 4 variants (E2B, E4B, 26B-A4B, 31B) when targeting modules like
q_projthat appear in both the language model (standardnn.Linear) and vision/audio towers (Gemma4ClippableLinear).Root Cause
Gemma4ClippableLinearis annn.Module(not annn.Linearsubclass) that wraps a standardnn.Linear:Because it doesn't subclass
nn.Linear,dispatch_defaultdoesn't match it, and_get_in_out_featurescan't extract dimensions from it.Workaround
Users can work around this today with
exclude_modules:This avoids the ClippableLinear modules entirely but prevents fine-tuning vision/audio tower projections.
For users who need to target these modules, a monkey-patch works for basic training (but breaks merge/unmerge):
Analysis of a Proper Fix
We attempted several approaches and identified the code paths that need updating for full support:
1.
dispatch_defaultinlayer.pyAdd an
eliffor wrapper modules with an inner.linear. The wrapper should be passed asbase_layer(not the inner linear) to preserve the wrapper's forward behavior.2.
_get_in_out_featuresintuners_utils.pyAdd support for extracting
in_features/out_featuresfrom the inner.linear.3.
merge()/unmerge()/merge_and_unload()inlayer.pyThese access
self.get_base_layer().weight, which fails for wrappers that don't expose.weightdirectly (it's at.linear.weight). Either the wrapper needs a.weightproperty orget_base_layer()needs to drill through.4. Weight-based initializers
olora,pissa,corda,loftq,orthogonal,lora_gaall accessself.get_base_layer().weight. Same issue as merge/unmerge.Environment
google/gemma-4-E2B-it,google/gemma-4-E4B-it,google/gemma-4-26B-A4B-it,google/gemma-4-31B-itTested With