[Diffusion] Add Qwen Image ModelOpt FP8 support by BBuf · Pull Request #23155 · sgl-project/sglang

BBuf · 2026-04-19T09:20:11Z

Summary

Add ModelOpt FP8 support for Qwen Image diffusion transformers in the SGLang runtime and FP8 converter.

make Qwen Image attention, MLP, and top-level projections quant-aware with full checkpoint prefixes
add a Qwen Image / Qwen Image Edit BF16 fallback profile, including transformer_blocks.*.img_mlp.net.2 after image-quality ablation
fix the FP8 converter so explicit BF16 fallback tensors are written before ModelOpt ignore-preservation skips the source tensor
publish clean SGLang-native ModelOpt FP8 transformer overrides under the lmsys Hugging Face org
document the validated Qwen Image and Qwen Image Edit ModelOpt FP8 checkpoint flow in docs/diffusion/quantization.md
update the diffusion ModelOpt quant skill with the Qwen Image FP8 fallback and converter-ordering notes
add Qwen Image and Qwen Image Edit ModelOpt FP8 cases to the B200 diffusion CI set

Published FP8 weights

Qwen Image: https://huggingface.co/lmsys/qwen-image-modelopt-fp8-sglang-transformer
Qwen Image Edit: https://huggingface.co/lmsys/qwen-image-edit-modelopt-fp8-sglang-transformer

Both repos are intentionally clean transformer override repos: README.md, config.json, and .safetensors shards only.

Validation

Validated on H100 rank0 (CUDA_VISIBLE_DEVICES=0) for the generated artifacts and benchmarks.

SGLang main base used during validation: 6ecd6f84d
ModelOpt main used for PTQ export: 26ae8da51
python -m compileall -q python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py python/sglang/multimodal_gen/tools/build_modelopt_fp8_transformer.py -> passed
isort, black, and ruff check --select=F401,F821 --fix on changed Python files -> passed, no further changes
git diff --check -> passed
python3 -m py_compile python/sglang/multimodal_gen/test/server/testcase_configs.py python/sglang/multimodal_gen/test/server/gpu_cases.py -> passed after adding B200 cases
python3 -m black --check python/sglang/multimodal_gen/test/server/testcase_configs.py python/sglang/multimodal_gen/test/server/gpu_cases.py -> passed
python3 -m ruff check --select=F401,F821 python/sglang/multimodal_gen/test/server/testcase_configs.py python/sglang/multimodal_gen/test/server/gpu_cases.py -> passed

Qwen Image 1024x1024, 50 steps

Prompt: A futuristic cyberpunk city at night, neon lights reflecting on wet streets

BF16 image: visually normal
Old FP8 default: severe dark/blurred quality regression
Fixed FP8 (img_mlp.net.2 BF16 fallback): visually normal
converter stats: 660 quantized weights, 1320 scale tensors, 186 BF16 fallback weights, 3 output shards
benchmark with native sglang generate --backend=sglang --warmup:
- E2E: 13589.20 ms -> 12159.39 ms, 10.5% faster
- Denoising: 12928.76 ms -> 11437.40 ms, 11.5% faster

Qwen Image Edit 512x512, 8 steps

Prompt: A clean product photo of a small ceramic teapot on a wooden table, soft daylight, sharp details.

BF16 and fixed FP8 images are visually aligned for the smoke edit workload
converter stats: 660 quantized weights, 1320 scale tensors, 186 BF16 fallback weights, 3 output shards
benchmark with native sglang generate --backend=sglang --warmup:
- E2E: 6791.97 ms -> 6085.03 ms, 10.4% faster
- Denoising: 5204.32 ms -> 4524.01 ms, 13.1% faster

B200 CI

Added to ONE_GPU_MODELOPT_CASES for multimodal-gen-test-1-b200:

qwen_image_modelopt_fp8_t2i
qwen_image_edit_modelopt_fp8_ti2i

Notes

The ModelOpt Qwen model registration used for PTQ export was patched only in the H100 validation checkout and is not included here.
The fixed profiler trace was captured for Qwen Image 1024x1024, 8 steps, --profile --num-profiled-timesteps=2.

gemini-code-assist

Code Review

This pull request introduces quantization support for the Qwen Image model. Key changes include replacing standard linear layers with ReplicatedLinear to support quantization configurations and prefixes, and implementing custom QwenImageGELU and QwenImageFeedForward modules to maintain compatibility with the model's expected state dict structure. Additionally, the PR adds FP8 fallback patterns for Qwen Image and includes comprehensive unit tests to verify prefixing and quantization method assignments. I have no feedback to provide.

BBuf · 2026-04-19T14:09:56Z

Qwen Image / Image Edit FP8 validation update

The clean SGLang-native ModelOpt FP8 transformer checkpoints now live under the lmsys Hugging Face org:

Qwen Image: https://huggingface.co/lmsys/qwen-image-modelopt-fp8-sglang-transformer
Qwen Image Edit: https://huggingface.co/lmsys/qwen-image-edit-modelopt-fp8-sglang-transformer

These repos intentionally contain only the model card, config.json, and .safetensors shards. Validation images, benchmark JSON, command logs, and profiler traces are intentionally not stored in the clean model repos.

Benchmark summary

Native SGLang backend, H100 rank0, CUDA_VISIBLE_DEVICES=0, sglang generate --backend=sglang --warmup. FP8 uses the fixed checkpoint with transformer_blocks.*.img_mlp.net.2 kept as BF16 fallback.

Workload	BF16 E2E	FP8 E2E	E2E speedup	BF16 denoising	FP8 denoising	Denoising speedup
Qwen Image 1024x1024, 50 steps	13589.20 ms	12159.39 ms	10.5%	12928.76 ms	11437.40 ms	11.5%
Qwen Image Edit 512x512, 8 steps	6791.97 ms	6085.03 ms	10.4%	5204.32 ms	4524.01 ms	13.1%

Quality / profiler notes

Qwen Image BF16 and fixed FP8 outputs were visually aligned for the 1024x1024 50-step prompt; the old default FP8 checkpoint had the severe dark/blurred regression.
Qwen Image Edit BF16 and fixed FP8 outputs were visually aligned for the 512x512 8-step smoke edit workload.
Converter stats for both Qwen Image and Qwen Image Edit: 660 quantized weights, 1320 scale tensors, 186 BF16 fallback weights, 3 output shards.
Qwen Image 1024x1024, 8-step profiler capture: BF16 802.00 ms total CUDA kernel time vs fixed FP8 581.71 ms in the profiled region; FP8 CUTLASS GEMMs replace/reduce the dominant BF16 GEMM bucket while _static_quant_fp8 accounts for about 4.2% of captured CUDA kernel time.

BBuf · 2026-04-25T09:25:09Z

/tag-and-rerun-ci

BBuf · 2026-04-25T09:39:30Z

/tag-and-rerun-ci

BBuf · 2026-04-28T08:20:17Z

Updated this PR to use the new clean lmsys ModelOpt diffusion repos.

Qwen Image: https://huggingface.co/lmsys/qwen-image-modelopt-fp8-sglang-transformer
Qwen Image Edit: https://huggingface.co/lmsys/qwen-image-edit-modelopt-fp8-sglang-transformer
Added both Qwen ModelOpt FP8 cases to ONE_GPU_MODELOPT_CASES for multimodal-gen-test-1-b200.
The shared lmsys collection is https://huggingface.co/collections/lmsys/diffusion-modelopt-69f06a1740c02269e36bf285

BBuf · 2026-04-28T08:27:52Z

Pushed one follow-up lint fix (cd1ab4de3) for the import ordering reported by CI. The latest lint check is now green; multimodal-gen-component-accuracy and multimodal-gen-test-1-b200 are still queued.

…delopt-fp8

BBuf · 2026-05-02T02:17:26Z

/tag-and-rerun-ci

# Conflicts: # docs/diffusion/quantization.md # docs_new/docs/sglang-diffusion/quantization.mdx # python/sglang/multimodal_gen/test/server/testcase_configs.py

BBuf · 2026-05-03T08:58:08Z

/tag-and-rerun-ci

BBuf · 2026-05-03T16:24:10Z

https://github.com/sgl-project/sglang/actions/runs/25274844682/job/74103039242?pr=23155

github-actions Bot added quant LLM Quantization diffusion SGLang Diffusion labels Apr 19, 2026

gemini-code-assist Bot reviewed Apr 19, 2026

View reviewed changes

BBuf force-pushed the codex/qwen-image-modelopt-fp8 branch 3 times, most recently from 30fad97 to aca4193 Compare April 19, 2026 13:46

BBuf marked this pull request as ready for review April 19, 2026 14:10

BBuf requested review from mickqian, ping1jing2, yhyang201 and yingluosanqian as code owners April 19, 2026 14:10

BBuf added the run-ci label Apr 19, 2026 — with ChatGPT Codex Connector

BBuf mentioned this pull request Apr 19, 2026

[Diffusion] Add Qwen Image Edit flow coverage BBuf/sglang#2

Closed

BBuf force-pushed the codex/qwen-image-modelopt-fp8 branch from aca4193 to 017dfc3 Compare April 19, 2026 22:58

BBuf mentioned this pull request Apr 19, 2026

[Diffusion] Add Qwen Image Edit flow coverage BBuf/sglang#3

Closed

BBuf force-pushed the codex/qwen-image-modelopt-fp8 branch from 017dfc3 to f36faeb Compare April 19, 2026 23:04

github-actions Bot added the documentation Improvements or additions to documentation label Apr 19, 2026

Add Qwen Image ModelOpt FP8 diffusion support

321cbdf

BBuf force-pushed the codex/qwen-image-modelopt-fp8 branch from f36faeb to 321cbdf Compare April 19, 2026 23:07

BBuf mentioned this pull request Apr 20, 2026

Add HunyuanVideo ModelOpt FP8 diffusion support #23199

Merged

Merge branch 'main' into codex/qwen-image-modelopt-fp8

0f38e19

BBuf requested a review from wisclmy0611 as a code owner April 25, 2026 01:00

mickqian and others added 2 commits April 25, 2026 15:10

Merge branch 'main' into codex/qwen-image-modelopt-fp8

41e19ac

Merge remote-tracking branch 'upstream/main' into update-pr-23155

7853a24

Merge remote-tracking branch 'origin/main' into HEAD

e27081e

mickqian approved these changes Apr 27, 2026

View reviewed changes

Use lmsys Qwen ModelOpt checkpoints

bf1dbe1

BBuf requested a review from JustinTong0323 as a code owner April 28, 2026 08:14

Sort Qwen ModelOpt CI imports

cd1ab4d

BBuf mentioned this pull request Apr 29, 2026

SGLang AI Agent Performance Optimization PRs (2026-01-29 to 2026-04-29) BBuf/AI-Infra-Auto-Driven-SKILLS#46

Open

Merge remote-tracking branch 'upstream/main' into codex/qwen-image-mo…

1f609b6

…delopt-fp8

BBuf added the high priority label Apr 30, 2026

Merge remote-tracking branch 'upstream/main' into codex/qwen-image-mo…

150784f

…delopt-fp8

BBuf added 2 commits May 2, 2026 21:12

Merge remote-tracking branch 'origin/main' into HEAD

ecabb1b

# Conflicts: # docs/diffusion/quantization.md # docs_new/docs/sglang-diffusion/quantization.mdx # python/sglang/multimodal_gen/test/server/testcase_configs.py

Merge remote-tracking branch 'origin/main' into update-pr-23155

6a6bf65

BBuf merged commit f2d1390 into sgl-project:main May 3, 2026
71 of 79 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Diffusion] Add Qwen Image ModelOpt FP8 support#23155

[Diffusion] Add Qwen Image ModelOpt FP8 support#23155
BBuf merged 11 commits intosgl-project:mainfrom
BBuf:codex/qwen-image-modelopt-fp8

BBuf commented Apr 19, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

BBuf commented Apr 19, 2026 •

edited

Loading

Uh oh!

BBuf commented Apr 25, 2026

Uh oh!

BBuf commented Apr 25, 2026

Uh oh!

BBuf commented Apr 28, 2026

Uh oh!

BBuf commented Apr 28, 2026

Uh oh!

BBuf commented May 2, 2026

Uh oh!

BBuf commented May 3, 2026

Uh oh!

BBuf commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BBuf commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Published FP8 weights

Validation

Qwen Image 1024x1024, 50 steps

Qwen Image Edit 512x512, 8 steps

B200 CI

Notes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

BBuf commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Qwen Image / Image Edit FP8 validation update

Benchmark summary

Quality / profiler notes

Uh oh!

BBuf commented Apr 25, 2026

Uh oh!

BBuf commented Apr 25, 2026

Uh oh!

BBuf commented Apr 28, 2026

Uh oh!

BBuf commented Apr 28, 2026

Uh oh!

BBuf commented May 2, 2026

Uh oh!

BBuf commented May 3, 2026

Uh oh!

BBuf commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BBuf commented Apr 19, 2026 •

edited

Loading

BBuf commented Apr 19, 2026 •

edited

Loading