[Diffusion] Add Qwen Image ModelOpt FP8 support#23155
[Diffusion] Add Qwen Image ModelOpt FP8 support#23155BBuf merged 11 commits intosgl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces quantization support for the Qwen Image model. Key changes include replacing standard linear layers with ReplicatedLinear to support quantization configurations and prefixes, and implementing custom QwenImageGELU and QwenImageFeedForward modules to maintain compatibility with the model's expected state dict structure. Additionally, the PR adds FP8 fallback patterns for Qwen Image and includes comprehensive unit tests to verify prefixing and quantization method assignments. I have no feedback to provide.
30fad97 to
aca4193
Compare
Qwen Image / Image Edit FP8 validation updateThe clean SGLang-native ModelOpt FP8 transformer checkpoints now live under the
These repos intentionally contain only the model card, Benchmark summaryNative SGLang backend, H100 rank0,
Quality / profiler notes
|
aca4193 to
017dfc3
Compare
017dfc3 to
f36faeb
Compare
f36faeb to
321cbdf
Compare
|
/tag-and-rerun-ci |
1 similar comment
|
/tag-and-rerun-ci |
|
Updated this PR to use the new clean
|
|
Pushed one follow-up lint fix ( |
|
/tag-and-rerun-ci |
# Conflicts: # docs/diffusion/quantization.md # docs_new/docs/sglang-diffusion/quantization.mdx # python/sglang/multimodal_gen/test/server/testcase_configs.py
|
/tag-and-rerun-ci |
Summary
Add ModelOpt FP8 support for Qwen Image diffusion transformers in the SGLang runtime and FP8 converter.
transformer_blocks.*.img_mlp.net.2after image-quality ablationlmsysHugging Face orgdocs/diffusion/quantization.mdPublished FP8 weights
Both repos are intentionally clean transformer override repos:
README.md,config.json, and.safetensorsshards only.Validation
Validated on H100 rank0 (
CUDA_VISIBLE_DEVICES=0) for the generated artifacts and benchmarks.6ecd6f84d26ae8da51python -m compileall -q python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py python/sglang/multimodal_gen/tools/build_modelopt_fp8_transformer.py-> passedisort,black, andruff check --select=F401,F821 --fixon changed Python files -> passed, no further changesgit diff --check-> passedpython3 -m py_compile python/sglang/multimodal_gen/test/server/testcase_configs.py python/sglang/multimodal_gen/test/server/gpu_cases.py-> passed after adding B200 casespython3 -m black --check python/sglang/multimodal_gen/test/server/testcase_configs.py python/sglang/multimodal_gen/test/server/gpu_cases.py-> passedpython3 -m ruff check --select=F401,F821 python/sglang/multimodal_gen/test/server/testcase_configs.py python/sglang/multimodal_gen/test/server/gpu_cases.py-> passedQwen Image 1024x1024, 50 steps
Prompt:
A futuristic cyberpunk city at night, neon lights reflecting on wet streetsimg_mlp.net.2BF16 fallback): visually normalsglang generate --backend=sglang --warmup:Qwen Image Edit 512x512, 8 steps
Prompt:
A clean product photo of a small ceramic teapot on a wooden table, soft daylight, sharp details.sglang generate --backend=sglang --warmup:B200 CI
Added to
ONE_GPU_MODELOPT_CASESformultimodal-gen-test-1-b200:qwen_image_modelopt_fp8_t2iqwen_image_edit_modelopt_fp8_ti2iNotes
--profile --num-profiled-timesteps=2.