[diffusion] Support stable-diffusion-3-medium-diffusers with sglang backend by gxlvera · Pull Request #19225 · sgl-project/sglang

gxlvera · 2026-02-24T07:02:06Z

Overview

This PR supports

stable-diffusion-3-medium
stable-diffusion-3.5-medium
stable-diffusion-3.5-large

as sglang-native (using sglang as backend instead of diffusers).

Note:

this PR hasn't supported TP
it doesn't use sglang USPAttention so not yet supported SP.

Run with cli:

SGLANG_USE_MODELSCOPE=true sglang generate   --model-path stabilityai/stable-diffusion-3-medium-diffusers  --prompt "Epic anime artwork of a wizard atop a mountain at night casting a cosmic spell into the dark sky that says Stable Diffusion 3 made out of colorful energy"   --height 1024   --width 1024   --num-inference-steps 20   --guidance-scale 7.0   --num-gpus 1 --backend sglang

Epic_anime_artwork_of_a_wizard_atop_a_mountain_at_night_casting_a_cosmic_spell_into_the_dark_sky_tha_20260224-064924_e044671b

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-02-24T07:02:10Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

ping1jing2 · 2026-02-26T10:20:52Z

please provide a performance comparison report according to docs/diffusion/contributing.md

Baseline: run the benchmark (for a single generation task)

$ sglang generate --model-path <model> --prompt "A benchmark prompt" --perf-dump-path baseline.json

New: run the same benchmark, without modifying any server_args or sampling_params

$ sglang generate --model-path <model> --prompt "A benchmark prompt" --perf-dump-path new.json

Compare: run the compare script, which will print a Markdown table to the console

$ python python/sglang/multimodal_gen/benchmarks/compare_perf.py baseline.json new.json [new2.json ...]
### Performance Comparison Report
...

Paste: paste the table into the PR description

gxlvera · 2026-02-27T07:46:00Z

please provide a performance comparison report according to docs/diffusion/contributing.md
Baseline: run the benchmark (for a single generation task)
$ sglang generate --model-path <model> --prompt "A benchmark prompt" --perf-dump-path baseline.json
New: run the same benchmark, without modifying any server_args or sampling_params
$ sglang generate --model-path <model> --prompt "A benchmark prompt" --perf-dump-path new.json
Compare: run the compare script, which will print a Markdown table to the console
$ python python/sglang/multimodal_gen/benchmarks/compare_perf.py baseline.json new.json [new2.json ...]
### Performance Comparison Report
...
Paste: paste the table into the PR description

Ok I will do that, thanks!

zhaochenyang20 · 2026-02-27T21:59:44Z

sglang generate   --model-path stabilityai/stable-diffusion-3-medium-diffusers  --prompt "Close-up shot of a tiny, fluffy white Siberian Forest kitten cuddling and allogrooming with a massive, round brown Maine Coon/British Shorthair mix. They are curled together, softly licking each other's fur."   --height 1024   --width 1024   --num-inference-steps 20   --guidance-scale 7.0   --num-gpus 1 --backend sglang

…d3 vae select vae logics in vae_loader.py with minimum modification

…n/nsa/utils.py

mickqian

TODO:

support combined cfg

footer

…encoder_index out of bound

…s in base.py and sd3 flux to customize encoder attention mask handling and pooler output handling

…lete sd3 hardcode in vae_loader.py

mickqian · 2026-04-10T10:36:09Z

+            )
            if is_flux_v1:
                pooled_embeds_list.append(outputs.pooler_output)
+            elif is_sd3 and i <= 1:


could we use something like use_pooler_output = is_sd3 or is_flux_v1

yhyang201 · 2026-04-13T07:35:26Z

/tag-and-rerun-ci

…roject#19225) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: Kangrui Du <kangruidu@gmail.com> Co-authored-by: Xiaole Guo <gxlvera@gmail.com>

gxlvera requested review from BBuf, mickqian, ping1jing2, yhyang201 and yingluosanqian as code owners February 24, 2026 07:02

github-actions Bot added the diffusion SGLang Diffusion label Feb 24, 2026

gxlvera commented Feb 24, 2026

View reviewed changes

Comment thread python/sglang/multimodal_gen/runtime/models/registry.py Outdated