[diffusion] Add Sage Attention 3 Support for sm 120 (RTX5090) by ryang-max · Pull Request #15382 · sgl-project/sglang

ryang-max · 2025-12-18T07:18:51Z

Motivation

Support SGLang Diffusion in RTX 5090.

Modifications

Since Sage Attention 3 already supports RTX 5090, we choose it as default backend in such cases.

Accuracy Tests

Tested with:

black-forest-labs/FLUX.1-dev
Wan-AI/Wan2.1-I2V-14B-480P-Diffusers

Benchmarking and Profiling

For image models, no significant acceleration comparing with torch_sdpa

For video models:

sglang generate --model-path Wan-AI/Wan2.1-T2V-1.3B-Diffusers \
    --prompt "A curious raccoon" \
    --save-output

Comparison

Metric	sage_attn_3 (default)	torch_sdpa
Average time per step (s/step)	1.8552	3.2210
Total DenoisingStage time (s)	94.1848	162.5129
Speedup (torch_sdpa / sage_attn_3)	1.74×	1.00×
Time reduction (vs. torch_sdpa)	42.4%	0%

Limitation
As mentioned in sage attention official repo, all steps with SageAttn3 may introduce some loss in inference. This is also noticed in my experiment. Will fix it with hybrid attention in another PR.

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

gemini-code-assist · 2025-12-18T07:18:54Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

mickqian · 2025-12-18T12:31:49Z

+                if model is not None:
+                    model.to("cpu")
+            logger.info(
+                "Offloaded denoiser transformer weights to CPU after denoising to reduce peak VRAM during VAE decoding."


should we set dit_cpu_offload

dit_cpu_offload was used to switch transformer in denoising process, which is useful for all serving modes; but for this, it's to offload all transformers after denoising stage(while they won't be used in offline mode). So I think is_local_mode would be better

Co-authored-by: Mengxi Li <marcyleemx@gmail.com>

mickqian · 2025-12-19T08:48:14Z

/rerun-failed-ci

mickqian · 2025-12-19T12:29:37Z

/tag-and-rerun-ci

IPostYellow · 2025-12-22T08:34:26Z

Hi, @ryang-max could you share your launch cmd ？

ryang-max · 2025-12-22T09:53:37Z

Hi, @ryang-max could you share your launch cmd ？

Hi @IPostYellow , just use default command described in blog can start it. If OOM, please try smaller image size. Curerntly we have tested flux1 and wan2.1-t2v-1.3B with 480p. And we're working actively on optimizing memory usage and parallelism on 5090.

IPostYellow · 2025-12-22T10:32:38Z

Hi, @ryang-max could you share your launch cmd ？

Hi @IPostYellow , just use default command described in blog can start it. If OOM, please try smaller image size. Curerntly we have tested flux1 and wan2.1-t2v-1.3B with 480p. And we're working actively on optimizing memory usage and parallelism on 5090.
@ryang-max thank you for your reply.
During my experiments in qwen-image, I encounter the following error:
RuntimeError: The size of tensor a (28) must match the size of tensor b (4) at non-singleton dimension 1
This is because sage3 does not support Qwen2.5-VL I guess.
but in sglang/multimodal_gen/runtime/platforms/cuda.py it seems all attention_backend will be set sage3 in 5090 if selected_backend==None due to

if is_sm120():
                try:
                    from sglang.multimodal_gen.runtime.layers.attention.backends.sage_attn3 import (  # noqa: F401
                        SageAttention3Backend,
                    )

                    logger.info("Using Sage Attention 3 backend")
                    return "sglang.multimodal_gen.runtime.layers.attention.backends.sage_attn3.SageAttention3Backend"
                except ImportError as e:
                    logger.info(e)
                    logger.info(
                        "Sage Attention 3 backend is not installed, Falling back to Torch SDPA (To install it, see https://github.com/thu-ml/SageAttention/tree/main/sageattention3_blackwell#installation)"
                    )
                    target_backend = AttentionBackendEnum.TORCH_SDPA

ryang-max · 2025-12-22T10:43:29Z

Hi @IPostYellow , yes Qwen-Image has some issue, working on fixing it. You can try flux.1-dev or Wan-AI/Wan2.1-I2V-14B-480P-Diffuser for now.

…RTX5090) (sgl-project#15382) Co-authored-by: Mengxi Li <marcyleemx@gmail.com>

ryang-max added 2 commits December 18, 2025 06:48

add fallback for sm120

04b5e18

add sage_attn_3

5253167

ryang-max requested review from mickqian and yhyang201 as code owners December 18, 2025 07:18

github-actions Bot added the diffusion SGLang Diffusion label Dec 18, 2025

ryang-max and others added 4 commits December 18, 2025 15:20

Merge branch 'main' into diffusion5090_1

68b3545

lint

71b296b

offload when offline mode

6e9a34e

comment

286322b

mickqian reviewed Dec 18, 2025

View reviewed changes

ryang-max and others added 9 commits December 19, 2025 06:46

add fallback for sm120

e1a732e

add

3c783cd

using is_local_mode

0c0a098

Merge branch 'main' into diffusion5090_1

f09582a

fix log

9179bf1

remove offline_mode

60e8268

lint

4750683

Co-authored-by: Mengxi Li <marcyleemx@gmail.com>

Add co-author attribution

9f27c15

Co-authored-by: Mengxi Li <marcyleemx@gmail.com>

Merge branch 'main' into diffusion5090_1

8629310

github-actions Bot added the run-ci label Dec 19, 2025

mickqian merged commit 1e58248 into sgl-project:main Dec 19, 2025
164 of 174 checks passed

ryang-max mentioned this pull request Dec 21, 2025

[diffusion] Remove Default post dit offload in local mode #15573

Merged

6 tasks

ryang-max deleted the diffusion5090_1 branch December 23, 2025 01:09

Prozac614 pushed a commit to Prozac614/sglang that referenced this pull request Dec 23, 2025

[diffusion] multi-platform: add Sage Attention 3 Support for sm 120 (…

5c6c875

…RTX5090) (sgl-project#15382) Co-authored-by: Mengxi Li <marcyleemx@gmail.com>

jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025

[diffusion] multi-platform: add Sage Attention 3 Support for sm 120 (…

7ff8719

…RTX5090) (sgl-project#15382) Co-authored-by: Mengxi Li <marcyleemx@gmail.com>

YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026

[diffusion] multi-platform: add Sage Attention 3 Support for sm 120 (…

214d781

…RTX5090) (sgl-project#15382) Co-authored-by: Mengxi Li <marcyleemx@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[diffusion] Add Sage Attention 3 Support for sm 120 (RTX5090)#15382

[diffusion] Add Sage Attention 3 Support for sm 120 (RTX5090)#15382
mickqian merged 15 commits intosgl-project:mainfrom
ryang-max:diffusion5090_1

ryang-max commented Dec 18, 2025 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Dec 18, 2025

Uh oh!

Uh oh!

mickqian Dec 18, 2025

Uh oh!

ryang-max Dec 19, 2025

Uh oh!

Uh oh!

mickqian commented Dec 19, 2025

Uh oh!

mickqian commented Dec 19, 2025

Uh oh!

Uh oh!

IPostYellow commented Dec 22, 2025

Uh oh!

ryang-max commented Dec 22, 2025 •

edited

Loading

Uh oh!

IPostYellow commented Dec 22, 2025 •

edited

Loading

Uh oh!

ryang-max commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ryang-max commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented Dec 18, 2025

Uh oh!

Uh oh!

mickqian Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

ryang-max Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mickqian commented Dec 19, 2025

Uh oh!

mickqian commented Dec 19, 2025

Uh oh!

Uh oh!

IPostYellow commented Dec 22, 2025

Uh oh!

ryang-max commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IPostYellow commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ryang-max commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ryang-max commented Dec 18, 2025 •

edited

Loading

ryang-max commented Dec 22, 2025 •

edited

Loading

IPostYellow commented Dec 22, 2025 •

edited

Loading