[diffusion] Add Sage Attention 3 Support for sm 120 (RTX5090)#15382
[diffusion] Add Sage Attention 3 Support for sm 120 (RTX5090)#15382mickqian merged 15 commits intosgl-project:mainfrom
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
| if model is not None: | ||
| model.to("cpu") | ||
| logger.info( | ||
| "Offloaded denoiser transformer weights to CPU after denoising to reduce peak VRAM during VAE decoding." |
There was a problem hiding this comment.
should we set dit_cpu_offload
There was a problem hiding this comment.
dit_cpu_offload was used to switch transformer in denoising process, which is useful for all serving modes; but for this, it's to offload all transformers after denoising stage(while they won't be used in offline mode). So I think is_local_mode would be better
Co-authored-by: Mengxi Li <marcyleemx@gmail.com>
|
/rerun-failed-ci |
|
/tag-and-rerun-ci |
|
Hi, @ryang-max could you share your launch cmd ? |
Hi @IPostYellow , just use default command described in blog can start it. If OOM, please try smaller image size. Curerntly we have tested flux1 and wan2.1-t2v-1.3B with 480p. And we're working actively on optimizing memory usage and parallelism on 5090. |
|
|
Hi @IPostYellow , yes Qwen-Image has some issue, working on fixing it. You can try flux.1-dev or Wan-AI/Wan2.1-I2V-14B-480P-Diffuser for now. |
…RTX5090) (sgl-project#15382) Co-authored-by: Mengxi Li <marcyleemx@gmail.com>
…RTX5090) (sgl-project#15382) Co-authored-by: Mengxi Li <marcyleemx@gmail.com>
…RTX5090) (sgl-project#15382) Co-authored-by: Mengxi Li <marcyleemx@gmail.com>
Motivation
Support SGLang Diffusion in RTX 5090.
Modifications
Since Sage Attention 3 already supports RTX 5090, we choose it as default backend in such cases.
Accuracy Tests
Tested with:
black-forest-labs/FLUX.1-devWan-AI/Wan2.1-I2V-14B-480P-DiffusersBenchmarking and Profiling
For image models, no significant acceleration comparing with
torch_sdpaFor video models:
Comparison
Limitation
As mentioned in sage attention official repo, all steps with SageAttn3 may introduce some loss in inference. This is also noticed in my experiment. Will fix it with hybrid attention in another PR.
Checklist