Skip to content

[diffusion] Diffusion norm fusion for z-image#18762

Merged
mickqian merged 9 commits intosgl-project:mainfrom
qimcis:diffusion-norm-fusion-for-zimage
Apr 4, 2026
Merged

[diffusion] Diffusion norm fusion for z-image#18762
mickqian merged 9 commits intosgl-project:mainfrom
qimcis:diffusion-norm-fusion-for-zimage

Conversation

@qimcis
Copy link
Copy Markdown
Contributor

@qimcis qimcis commented Feb 13, 2026

Motivation

Speed up Z-Image DiT modulation by using the fused residual form path residual + tanh(gate) * rmsnorm(x)

Initial kernel was authored by: yihan chen @yingluosanqian

Modifications

  • Kernel by Yihan: fused_norm_tanh_mul_add CuTeDSL kernel (norm(x) * tanh(scale) + shift, used here in residual form as residual + tanh(gate) * rmsnorm(x)).
  • Added fused helper, wired Z-Image attention/FFN modulation to use it

Benchmarking and Profiling

1x NVIDIA H100 80GB, Tongyi-MAI/Z-Image-Turbo

sglang generate --model-path=Tongyi-MAI/Z-Image-Turbo --log-level=info --prompt='A fantasy landscape with mountains and a river, detailed, vibrant colors' --width=1024 --height=1024 --num-inference-steps=9 --guidance-scale=0.0 --seed=42 --save-output --enable-torch-compile --warmup --dit-cpu-offload false --text-encoder-cpu-offload false

Baseline: main@4c6afbeea

Summary (Mean of 10 Runs)

Metric Baseline New Δ (%)
E2E (ms) 2,041.00 1,930.00 -5.44%
Throughput (req/s) 0.490 0.518 +5.75%
Denoising Stage (ms) 1,918.30 1,819.61 -5.14%
Avg Denoise Step (ms) 212.83 201.84 -5.16%
Text Encoding (ms) 114.19 102.80 -9.97%
Peak Memory (MB) 32,418.47 32,487.40 +0.21%

Run-to-Run E2E Consistency

Run E2E Baseline (ms) E2E New (ms) Δ (%)
r1 2,000 1,950 -2.50%
r2 1,990 1,920 -3.52%
r3 2,040 1,890 -7.35%
r4 2,060 1,890 -8.25%
r5 2,080 1,930 -7.21%
r6 2,080 1,920 -7.69%
r7 2,030 1,970 -2.96%
r8 1,990 1,880 -5.53%
r9 2,050 2,010 -1.95%
r10 2,090 1,940 -7.18%

Precision OK
Baseline:
before

After:
after

Checklist

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added the diffusion SGLang Diffusion label Feb 13, 2026
@qimcis qimcis marked this pull request as ready for review February 13, 2026 15:07
@yingluosanqian
Copy link
Copy Markdown
Collaborator

could you also share the images generated before and after the fuse so we can check the precision?

@qimcis
Copy link
Copy Markdown
Contributor Author

qimcis commented Mar 10, 2026

Are we still looking to get this merged? @yingluosanqian I believe it should be ready for review

@yingluosanqian
Copy link
Copy Markdown
Collaborator

Are we still looking to get this merged? @yingluosanqian I believe it should be ready for review

hi, i left some comments earlier. could you please take a look and fix them first?

@qimcis
Copy link
Copy Markdown
Contributor Author

qimcis commented Mar 11, 2026

could you also share the images generated before and after the fuse so we can check the precision?

if you mean this comment, i attached the images in the pr description!

Comment thread python/sglang/jit_kernel/diffusion/cutedsl/norm_tanh_mul_add_norm_scale.py Outdated
Comment thread python/sglang/multimodal_gen/runtime/models/dits/zimage.py Outdated
Comment thread python/sglang/multimodal_gen/runtime/layers/layernorm.py Outdated
@yingluosanqian yingluosanqian enabled auto-merge (squash) April 2, 2026 01:29
yingluosanqian and others added 6 commits April 2, 2026 01:59
Signed-off-by: Chi McIsaac <chixie.mcisaac@gmail.com>
Signed-off-by: Chi McIsaac <chixie.mcisaac@gmail.com>
auto-merge was automatically disabled April 2, 2026 02:09

Head branch was pushed to by a user without write access

@qimcis qimcis force-pushed the diffusion-norm-fusion-for-zimage branch from 5280ba6 to 17bdca1 Compare April 2, 2026 02:09
@mickqian
Copy link
Copy Markdown
Collaborator

mickqian commented Apr 3, 2026

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@mickqian mickqian merged commit 005e582 into sgl-project:main Apr 4, 2026
191 of 250 checks passed
sundar24295s pushed a commit to sundar24295s/sglang that referenced this pull request Apr 4, 2026
Signed-off-by: Chi McIsaac <chixie.mcisaac@gmail.com>
Co-authored-by: yihanc <yingluosanqian@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
Signed-off-by: Chi McIsaac <chixie.mcisaac@gmail.com>
Co-authored-by: yihanc <yingluosanqian@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
Fridge003 pushed a commit that referenced this pull request Apr 7, 2026
Signed-off-by: Chi McIsaac <chixie.mcisaac@gmail.com>
Co-authored-by: yihanc <yingluosanqian@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
xiezhq-hermann pushed a commit to antgroup/sglang that referenced this pull request Apr 7, 2026
Signed-off-by: Chi McIsaac <chixie.mcisaac@gmail.com>
Co-authored-by: yihanc <yingluosanqian@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
Signed-off-by: Chi McIsaac <chixie.mcisaac@gmail.com>
Co-authored-by: yihanc <yingluosanqian@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants