Skip to content

[LoRA][Gemma4] Support vision tower LoRA#42662

Open
linitra24 wants to merge 18 commits into
vllm-project:mainfrom
linitra24:gemma4-mm-lora
Open

[LoRA][Gemma4] Support vision tower LoRA#42662
linitra24 wants to merge 18 commits into
vllm-project:mainfrom
linitra24:gemma4-mm-lora

Conversation

@linitra24

@linitra24 linitra24 commented May 14, 2026

Copy link
Copy Markdown
Contributor

This PR adds the remaining LoRA plumbing needed for Gemma4 multimodal LoRA support.

After #43798, Gemma4-MM vision linear layers are already converted through the Transformers backend path, so this PR no longer reimplements the Gemma4 vision tower. Instead, it focuses on the runtime LoRA mapping and token-counting pieces needed by Gemma4 image/video/audio inputs.

Main changes:

  • Add a multimodal LoRA token-count interface so models can report separate tower and connector token counts.
  • Update Gemma4-MM to report modality-specific LoRA token counts for image, video, and audio inputs.
  • Size multimodal LoRA wrappers using the largest tower/connector token budget across modalities.

Test Plan

Additional end-to-end tests for real Gemma4 vision LoRA adapters should also be added in a follow-up.

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 80413b1c2b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread vllm/model_executor/models/gemma4_mm.py Outdated
padding_positions: torch.Tensor,
) -> torch.Tensor:
pixel_values = 2 * (pixel_values - 0.5)
hidden_states = self.input_proj(pixel_values.to(self.input_proj.weight.dtype))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid reading weight on quantized linear layers

When Gemma4 is loaded with a quantization method whose LinearMethod replaces weight (for example GGUF registers qweight/qweight_type instead of weight), image or video requests will fail here before the vision tower runs because self.input_proj.weight does not exist. Since this commit now passes quant_config into the vision tower, the patch embedder should not use the vLLM linear layer's weight attribute to choose the activation dtype.

Useful? React with 👍 / 👎.

@mergify

mergify Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

Hi @linitra24, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the vision tower in the Gemma 4 multimodal model with native vLLM modules, including custom implementations for patch embedding, pooling, and multidimensional rotary embeddings. This change enables better integration with vLLM features like LoRA. A compatibility issue was identified where the use of the "strict=True" argument in zip() would cause failures on Python 3.9, which is currently supported by vLLM.

Comment thread vllm/model_executor/models/gemma4_mm.py Outdated
unsqueeze_dim=unsqueeze_dim,
)
for hidden_part, cos_part, sin_part in zip(
hidden_parts, cos_parts, sin_parts, strict=True

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The strict=True argument in zip() was introduced in Python 3.10. Since vLLM supports Python 3.9, this will cause a TypeError on older Python versions. Please remove the strict=True argument.

Suggested change
hidden_parts, cos_parts, sin_parts, strict=True
hidden_parts, cos_parts, sin_parts
)

@mergify

mergify Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

Documentation preview: https://vllm--42662.org.readthedocs.build/en/42662/

@mergify mergify Bot added the documentation Improvements or additions to documentation label May 14, 2026
@linitra24 linitra24 changed the title Gemma4 mm lora [LoRA][Gemma4] Support vision tower LoRA May 14, 2026
@jeejeelee jeejeelee self-assigned this May 15, 2026

@jeejeelee jeejeelee left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To speed up this feature landing, maybe you can split the vision tower support into another PR.

@mergify

mergify Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @linitra24.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Jun 3, 2026
@linitra24 linitra24 requested a review from njhill as a code owner June 7, 2026 15:48
@mergify mergify Bot added v1 and removed needs-rebase labels Jun 7, 2026
linitra24 added 5 commits June 7, 2026 15:52
Signed-off-by: bk-201 <joy25810@foxmail.com>
Signed-off-by: bk-201 <joy25810@foxmail.com>
@mergify

mergify Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Documentation preview: https://vllm--42662.org.readthedocs.build/en/42662/

@linitra24 linitra24 requested a review from jeejeelee June 11, 2026 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants