[New Model] Gemma 4#21952
Merged
Kangyan-Zhou merged 140 commits intosgl-project:mainfrom Apr 7, 2026
Merged
Conversation
The HF reference applies layer_scalar to every Gemma4DecoderLayer, not just full-attention layers. New checkpoints have non-trivial scalar values on SWA layers that were being silently ignored. Made-with: Cursor
Gate the two-buffer path on sliding_window_size to make intent explicit, and rewrite comment to explain the kernel's // Lv stride constraint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kp/gemma4 multimodal support
5 tasks
michaelzhang-ai
added a commit
that referenced
this pull request
Apr 6, 2026
Collaborator
Author
|
/rerun-failed-ci |
Collaborator
Author
|
/rerun-failed-ci |
1 similar comment
Collaborator
Author
|
/rerun-failed-ci |
3 tasks
Collaborator
Author
|
/rerun-failed-ci |
This was referenced Apr 7, 2026
5 tasks
Fridge003
pushed a commit
that referenced
this pull request
Apr 7, 2026
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Pengyu Chen <pychen96@gmail.com> Co-authored-by: kpham-sgl <khoa.pham@radixark.ai> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Andy Luo <andy.luo@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: adarshxs <adarsh.shirawalmath@gmail.com>
5 tasks
michaelzhang-ai
added a commit
that referenced
this pull request
Apr 9, 2026
Add accuracy tests for Google Gemma 4 models on AMD GPUs, covering both MI30x (MI325/MI300X) and MI35x platforms with both default ROCm and ROCm 7.2 workflows. Models tested: - google/gemma-4-E4B-it (Dense, ~4B params, TP=1) - google/gemma-4-31B-it (Dense, 31B params, TP=1) All models use --attention-backend triton (required for bidirectional image-token attention on AMD GPUs) and --reasoning-parser/--tool-call-parser gemma4 per the upstream model PR #21952. Test files: - test/registered/amd/accuracy/mi30x/test_gemma4_eval_amd.py - test/registered/amd/accuracy/mi35x/test_gemma4_eval_mi35x.py Workflow jobs added: - nightly-accuracy-2-gpu-gemma4 (MI30x, 2-GPU runner) - nightly-8-gpu-mi35x-gemma4 (MI35x, 8-GPU runner) - Corresponding ROCm 7.2 variants Depends on: #21952 (Gemma 4 model support) Ref: https://www.amd.com/en/developer/resources/technical-articles/2026/day-0-support-for-gemma-4-on-amd-processors-and-gpus.html
michaelzhang-ai
added a commit
that referenced
this pull request
Apr 9, 2026
The CI Docker image ships an older transformers that doesn't recognize the gemma4 architecture. Install from the specific commit required by the Gemma 4 model PR (#21952).
michaelzhang-ai
added a commit
that referenced
this pull request
Apr 9, 2026
The CI Docker image has an older transformers that doesn't recognize the gemma4 model architecture. Install from the specific commit required by model PR #21952.
michaelzhang-ai
added a commit
that referenced
this pull request
Apr 9, 2026
Add accuracy tests for Google Gemma 4 models on AMD GPUs (MI30x and MI35x) with both default ROCm and ROCm 7.2 workflows. Models tested: - google/gemma-4-E4B-it (Dense ~4B, TP=1) - google/gemma-4-31B-it (Dense 31B, TP=1) Server config: --attention-backend triton (required for bidirectional image-token attention on AMD GPUs per AMD Day 0 article). Each CI job installs transformers from the commit required by #21952.
michaelzhang-ai
added a commit
that referenced
this pull request
Apr 10, 2026
Add accuracy tests for Google Gemma 4 models on AMD GPUs (MI30x and MI35x) with both default ROCm and ROCm 7.2 workflows. Models tested: - google/gemma-4-E4B-it (Dense ~4B, TP=1) - google/gemma-4-31B-it (Dense 31B, TP=1) Server config: --attention-backend triton (required for bidirectional image-token attention on AMD GPUs per AMD Day 0 article). Each CI job installs transformers from the commit required by #21952.
michaelzhang-ai
added a commit
that referenced
this pull request
Apr 10, 2026
Add accuracy tests for Google Gemma 4 models on AMD GPUs (MI30x and MI35x) with both default ROCm and ROCm 7.2 workflows. Models tested: - google/gemma-4-E4B-it (Dense ~4B, TP=1) - google/gemma-4-31B-it (Dense 31B, TP=1) Server config: --attention-backend triton (required for bidirectional image-token attention on AMD GPUs per AMD Day 0 article). Each CI job installs transformers from the commit required by #21952.
michaelzhang-ai
added a commit
that referenced
this pull request
Apr 10, 2026
Add Gemma 4 accuracy test as a step within existing CI jobs rather than standalone jobs: - 2-GPU job (nightly-accuracy-2-gpu): new step after GSM8K eval - MI35x 8-GPU job (nightly-accuracy-8-gpu-mi35x): new step after GPT-OSS Tests google/gemma-4-31B-it (Dense 31B, TP=1) on mgsm_en with --attention-backend triton and threshold 0.90 (observed 0.984). Each step installs transformers from the commit required by #21952.
michaelzhang-ai
added a commit
that referenced
this pull request
Apr 13, 2026
Add Gemma 4 accuracy test as a step within existing CI jobs rather than standalone jobs: - 2-GPU job (nightly-accuracy-2-gpu): new step after GSM8K eval - MI35x 8-GPU job (nightly-accuracy-8-gpu-mi35x): new step after GPT-OSS Tests google/gemma-4-31B-it (Dense 31B, TP=1) on mgsm_en with --attention-backend triton and threshold 0.90 (observed 0.984). Each step installs transformers from the commit required by #21952.
michaelzhang-ai
added a commit
that referenced
this pull request
Apr 13, 2026
Add Gemma 4 accuracy test as a step within existing CI jobs rather than standalone jobs: - 2-GPU job (nightly-accuracy-2-gpu): new step after GSM8K eval - MI35x 8-GPU job (nightly-accuracy-8-gpu-mi35x): new step after GPT-OSS Tests google/gemma-4-31B-it (Dense 31B, TP=1) on mgsm_en with --attention-backend triton and threshold 0.90 (observed 0.984). Each step installs transformers from the commit required by #21952.
michaelzhang-ai
added a commit
that referenced
this pull request
Apr 13, 2026
Add Gemma 4 accuracy test as a step within existing CI jobs rather than standalone jobs: - 2-GPU job (nightly-accuracy-2-gpu): new step after GSM8K eval - MI35x 8-GPU job (nightly-accuracy-8-gpu-mi35x): new step after GPT-OSS Tests google/gemma-4-31B-it (Dense 31B, TP=1) on mgsm_en with --attention-backend triton and threshold 0.90 (observed 0.984). Each step installs transformers from the commit required by #21952.
michaelzhang-ai
added a commit
that referenced
this pull request
Apr 13, 2026
Add Gemma 4 accuracy test as a step within the existing 2-GPU accuracy job (nightly-accuracy-2-gpu) for both default ROCm and ROCm 7.2 workflows. Tests google/gemma-4-31B-it (Dense 31B, TP=1) on mgsm_en with --attention-backend triton and threshold 0.90 (observed 0.984). Step uses if:always() to run even if prior GSM8K step fails. Each step installs transformers from the commit required by #21952.
michaelzhang-ai
added a commit
that referenced
this pull request
Apr 13, 2026
Add Gemma 4 accuracy test as a step within the existing 2-GPU accuracy job (nightly-accuracy-2-gpu) for both default ROCm and ROCm 7.2 workflows. Tests google/gemma-4-31B-it (Dense 31B, TP=1) on mgsm_en with --attention-backend triton and threshold 0.90 (observed 0.984). Step uses if:always() to run even if prior GSM8K step fails. Each step installs transformers from the commit required by #21952.
michaelzhang-ai
added a commit
that referenced
this pull request
Apr 13, 2026
Add Gemma 4 accuracy test as a step within the existing 2-GPU accuracy job (nightly-accuracy-2-gpu) for both default ROCm and ROCm 7.2 workflows. Tests google/gemma-4-31B-it (Dense 31B, TP=1) on mgsm_en with --attention-backend triton and threshold 0.90 (observed 0.984). Step uses if:always() to run even if prior GSM8K step fails. Each step installs transformers from the commit required by #21952.
yhyang201
pushed a commit
to yhyang201/sglang
that referenced
this pull request
Apr 22, 2026
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Pengyu Chen <pychen96@gmail.com> Co-authored-by: kpham-sgl <khoa.pham@radixark.ai> Co-authored-by: Andy Luo <andy.luo@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: adarshxs <adarsh.shirawalmath@gmail.com>
Merged
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Add Gemma 4 model support to SGLang. Gemma 4 is Google's next-generation family of open models featuring Dense and MoE architectures, multimodal support (text, image, audio), hybrid reasoning, and native tool calling.
Supported Models:
Installation
Usage
Launch Server
Basic Chat
Vision
Reasoning (Thinking Mode)
Thinking is not enabled by default — pass
chat_template_kwargs: {"enable_thinking": true}to activate:Tool Calling
Accuracy Tests
MMLU (gemma-4-26B-A4B-it, H200)
GSM8K (gemma-4-26B-A4B-it, H200)
MMMU (gemma-4-26B-A4B-it, H200)
Speed Tests and Profiling
See full benchmark results in the SGLang Cookbook - Gemma 4.
Modifications
gemma4_causal.py,gemma4_mm.py,gemma4_vision.py,gemma4_audio.pyGemma4ForCausalLMandGemma4ForConditionalGenerationGemma4SGLangProcessormultimodal processor (image + audio)gemma4reasoning parser (<|channel>/<channel|>tokens)gemma4tool call parser (<|tool_call>/<tool_call|>tokens with streaming)Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ci