[Perf] Precompute gemma_weight to avoid redundant add on every forward by Chen-0210 · Pull Request #22673 · sgl-project/sglang

Chen-0210 · 2026-04-13T07:34:48Z

Motivation

GemmaRMSNorm computes weight + 1.0 on every forward call in forward_hip and forward_with_allreduce_fusion. This repeated tensor addition is unnecessary overhead.

Modifications

Replace runtime weight + 1.0 with the cached gemma_weight in forward_hip and forward_with_allreduce_fusion.

Accuracy Tests

python -m sglang.launch_server \
  --model-path /Qwen/Qwen3.5-397B-A17B/ \
  --tp-size 8 \
  --mamba-scheduler-strategy extra_buffer \
  --speculative-algo NEXTN \
  --speculative-num-steps 3 \
  --speculative-eagle-topk 1 \
  --speculative-num-draft-tokens 4 \

python3 benchmark/gsm8k/bench_sglang.py --parallel 1000 --num-questions 1000

100%|██████████| 1000/1000 [02:32<00:00,  6.55it/s]
Accuracy: 0.959
Invalid: 0.011
Latency: 152.584 s

Speed Tests and Profiling

Speed Tests

python -m sglang.launch_server \
  --model-path /Qwen/Qwen3.5-397B-A17B/ \
  --tp-size 8 \
  --mamba-scheduler-strategy extra_buffer \
  --speculative-algo NEXTN \
  --speculative-num-steps 3 \
  --speculative-eagle-topk 1 \
  --speculative-num-draft-tokens 4 \
python3 -m sglang.bench_serving --backend sglang --dataset-name random --num-prompts 256 --random-input-len 1024 --random-output-len 1024 --random-range-ratio 1 --max-concurrency 8..32

concurrency	before E2E ms	after E2E ms
8	6807.02	6762.07
16	9887.29	9642.32
32	13988.34	13954.73

profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist

Code Review

This pull request optimizes Gemma layer normalization by precomputing the gemma_weight (standard weight plus 1.0) and storing it as a buffer, rather than recalculating it during every forward pass. A critical issue was identified in the weight loader where self.gemma_weight is reassigned rather than updated in-place, which would break the buffer's connection to the module and cause issues when moving the model between devices.

Chen-0210 · 2026-04-13T11:25:53Z

/tag-and-rerun-ci

Chen-0210 · 2026-04-14T08:27:49Z

/tag-and-rerun-ci

Chen-0210 · 2026-04-15T07:54:44Z

/tag-and-rerun-ci

Chen-0210 · 2026-04-17T06:38:09Z

/tag-and-rerun-ci

Chen-0210 · 2026-04-17T08:06:16Z

/tag-and-rerun-ci

Chen-0210 · 2026-04-17T09:04:32Z

/tag-and-rerun-ci

ispobock · 2026-04-17T13:55:28Z

/rerun-stage stage-c-test-8-gpu-h200

github-actions · 2026-04-17T13:56:00Z

✅ Triggered stage-c-test-8-gpu-h200 to run independently (skipping dependencies). View workflow run

sgl-project#22673)

Chen-0210 requested review from BBuf, Edwardf0t1, Fridge003, HaiShaw, Ying1123, ch-wan, ispobock and merrymercy as code owners April 13, 2026 07:34

Chen-0210 changed the title ~~[Perf] Precompute GemmaRMSNorm gemma_weight to avoid redundant add on every forward~~ [Perf] Precompute gemma_weight to avoid redundant add on every forward Apr 13, 2026

gemini-code-assist Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread python/sglang/srt/layers/layernorm.py

github-actions Bot added the run-ci label Apr 13, 2026

Precompute GemmaRMSNorm gemma_weight via weight_loader callback

fa5120d

Chen-0210 force-pushed the gemmarmsnorm-precompute-clean branch from 3bd1362 to fa5120d Compare April 14, 2026 03:07

nvpohanh approved these changes Apr 16, 2026

View reviewed changes

b8zhong approved these changes Apr 16, 2026

View reviewed changes

ispobock merged commit 2bac219 into sgl-project:main Apr 17, 2026
492 of 570 checks passed

Chen-0210 deleted the gemmarmsnorm-precompute-clean branch April 18, 2026 03:47

jmamou pushed a commit to jmamou/sglang that referenced this pull request Apr 20, 2026

[Perf] Precompute gemma_weight to avoid redundant add on every forward (

9214dc6

sgl-project#22673)

zhangying098 pushed a commit to zhangying098/sglang that referenced this pull request Apr 23, 2026

[Perf] Precompute gemma_weight to avoid redundant add on every forward (

fbb0fb2

sgl-project#22673)

kyx1999 pushed a commit to KMSorSMS/sglang that referenced this pull request Apr 27, 2026

[Perf] Precompute gemma_weight to avoid redundant add on every forward (

36975ba

sgl-project#22673)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Perf] Precompute gemma_weight to avoid redundant add on every forward #22673

[Perf] Precompute gemma_weight to avoid redundant add on every forward #22673
ispobock merged 1 commit intosgl-project:mainfrom
Chen-0210:gemmarmsnorm-precompute-clean

Chen-0210 commented Apr 13, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Chen-0210 commented Apr 13, 2026

Uh oh!

Chen-0210 commented Apr 14, 2026

Uh oh!

Chen-0210 commented Apr 15, 2026

Uh oh!

Chen-0210 commented Apr 17, 2026

Uh oh!

Chen-0210 commented Apr 17, 2026

Uh oh!

Chen-0210 commented Apr 17, 2026

Uh oh!

ispobock commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Chen-0210 commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Chen-0210 commented Apr 13, 2026

Uh oh!

Chen-0210 commented Apr 14, 2026

Uh oh!

Chen-0210 commented Apr 15, 2026

Uh oh!

Chen-0210 commented Apr 17, 2026

Uh oh!

Chen-0210 commented Apr 17, 2026

Uh oh!

Chen-0210 commented Apr 17, 2026

Uh oh!

ispobock commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Chen-0210 commented Apr 13, 2026 •

edited

Loading