[diffusion] attention: add AITER Sage attention backend by avjves · Pull Request #20178 · sgl-project/sglang

avjves · 2026-03-09T09:16:11Z

Motivation

Sage attention is supported for NV GPUs, but the support for AMD Sage attention is missing. This PR adds the support for it.

Modifications

Adds a new backend type AITER_SAGE
Adds it as a supported attention backend where sage is typically supported

Accuracy Tests

Benchmarking and Profiling

Performance Comparison Report

1. High-level Summary

Metric	Baseline	New	Diff	Status
E2E Latency	276887.31 ms	241404.22 ms	-35483.09 ms (-12.8%)	✅
Throughput	0.00 req/s	0.00 req/s	-	-

2. Stage Breakdown

Stage Name	Baseline (ms)	New (ms)	Diff (ms)	Diff (%)	Status
InputValidationStage	5.10	4.61	-0.50	-9.7%	⚪️
TextEncodingStage	1461.92	1460.04	-1.88	-0.1%	⚪️
ImageEncodingStage	861.02	914.68	+53.66	+6.2%	⚪️
ImageVAEEncodingStage	4040.19	3997.95	-42.24	-1.0%	⚪️
LatentPreparationStage	0.17	0.13	-0.05	-26.3%	⚪️
TimestepPreparationStage	0.35	0.31	-0.05	-13.0%	⚪️
DenoisingStage	266231.67	230743.97	-35487.70	-13.3%	🟢
DecodingStage	4282.54	4278.92	-3.63	-0.1%	⚪️

Output videos

AITER:
https://github.com/user-attachments/assets/62ae8e96-85e0-4b3f-b1dc-c15c708b4738

AITER SAGE:
https://github.com/user-attachments/assets/43f64ee6-dba4-45a5-9bca-8302bcfaff22

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-03-09T09:16:49Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the diffusion model's compatibility and performance by introducing native support for Sage attention on AMD GPUs. By implementing the AITER_SAGE backend, the system can now leverage AMD hardware for attention computations, which was previously limited to NVIDIA GPUs. This change not only broadens hardware support but also delivers a substantial improvement in overall inference latency, making the system more efficient.

Highlights

AMD GPU Support: Introduced the AITER_SAGE attention backend to enable Sage attention on AMD GPUs, addressing a previous limitation to NV GPUs.
Performance Improvement: Achieved a notable 12.8% reduction in end-to-end latency, primarily driven by a 13.3% improvement in the Denoising Stage.
New Backend Integration: Added AITER_SAGE as a new attention backend type and integrated it across various configuration and runtime files.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

docs/diffusion/performance/attention_backends.md
- Documented the new aiter_sage attention backend.
- Updated the aiter backend's GPU compatibility to include ROCm.
- Added aiter_sage as a ROCm-supported backend.
python/sglang/multimodal_gen/configs/models/adapter/base.py
- Registered AITER_SAGE as a supported attention backend for adapter architectures.
python/sglang/multimodal_gen/configs/models/dits/base.py
- Registered AITER_SAGE as a supported attention backend for DiT architectures.
python/sglang/multimodal_gen/runtime/layers/attention/backends/aiter_sage.py
- Implemented the AITERSageBackend and AITERSageImpl for AMD GPU attention, utilizing aiter.ops.triton.attention.fav3_sage.
python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py
- Added AITER_SAGE to the set of supported attention backends for the QwenImageDiffusionModel.
python/sglang/multimodal_gen/runtime/platforms/interface.py
- Extended the AttentionBackendEnum to include AITER_SAGE.
python/sglang/multimodal_gen/runtime/platforms/rocm.py
- Configured the ROCm platform to select AITERSageBackend for AITER_SAGE, with dtype validation.

Activity

The author has completed the code formatting, documentation updates, and provided accuracy and speed benchmark results.
Unit tests are pending.
The PR outlines a review process involving Merge Oncalls, CODEOWNERS, and CI tests.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for the AITER_SAGE attention backend, which is targeted for AMD GPUs. The changes are well-contained and consistently applied across documentation, configuration, and implementation files. The new backend implementation correctly handles the optional aiter dependency. My primary feedback is to enhance the robustness of the new AITERSageImpl by adding explicit checks for unsupported attention features like causal masking, dropout, and grouped-query attention. This will prevent silent misconfigurations and make the backend safer to use.

gemini-code-assist · 2026-03-09T09:18:27Z

+        try:
+            from aiter.ops.triton.attention.fav3_sage import fav3_sage_wrapper_func
+
+            self.aiter_sage_attn_fn = fav3_sage_wrapper_func
+        except ImportError:
+            raise ImportError(
+                "AITER Sage attention is not available, please update AITER version."
+            )


The __init__ method accepts several parameters (causal, num_kv_heads, dropout_p, etc.) that are not used by the implementation. This could lead to silent misconfigurations where a user might expect a feature to be active (e.g., causal attention), but it is not applied. To make the implementation more robust, it's better to explicitly check for unsupported parameter values and raise an error.

if causal: raise NotImplementedError( "AITER Sage attention backend does not support causal attention." ) if dropout_p > 0.0: raise NotImplementedError( "AITER Sage attention backend does not support dropout." ) if num_kv_heads is not None and num_kv_heads != num_heads: raise NotImplementedError( "AITER Sage attention backend does not support Grouped Query Attention." ) try: from aiter.ops.triton.attention.fav3_sage import fav3_sage_wrapper_func self.aiter_sage_attn_fn = fav3_sage_wrapper_func except ImportError: raise ImportError( "AITER Sage attention is not available, please update AITER version." )

yhyang201 · 2026-03-10T03:50:58Z

/tag-and-rerun-ci

yhyang201 · 2026-03-10T05:47:08Z

/rerun-failed-ci

yhyang201 · 2026-03-10T06:55:04Z

@mickqian All CI (Nvidia + AMD) passed and PR is approved, ready for merge

— SGLDHelper bot

avjves added 3 commits March 6, 2026 13:21

[diffusion] attention: add AITER Sage attention backend

73f2b5b

Fix linting

fc33903

Update docs

5939dd0

avjves requested review from BBuf, mickqian, ping1jing2, yhyang201 and yingluosanqian as code owners March 9, 2026 09:16

github-actions Bot added documentation Improvements or additions to documentation amd diffusion SGLang Diffusion labels Mar 9, 2026

gemini-code-assist Bot reviewed Mar 9, 2026

View reviewed changes

mickqian approved these changes Mar 10, 2026

View reviewed changes

github-actions Bot added the run-ci label Mar 10, 2026

mickqian merged commit c8bbe50 into sgl-project:main Mar 11, 2026
143 of 151 checks passed

liubiyongge pushed a commit to liubiyongge/sglang that referenced this pull request Mar 13, 2026

[diffusion] feat: add AITER Sage attention backend (sgl-project#20178)

fc9e271

Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026

[diffusion] feat: add AITER Sage attention backend (sgl-project#20178)

d87c2c8

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026

[diffusion] feat: add AITER Sage attention backend (sgl-project#20178)

cee4f47

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

[diffusion] feat: add AITER Sage attention backend (sgl-project#20178)

9bc17f1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[diffusion] attention: add AITER Sage attention backend#20178

[diffusion] attention: add AITER Sage attention backend#20178
mickqian merged 3 commits intosgl-project:mainfrom
avjves:feature/aiter_sage_attention_support

avjves commented Mar 9, 2026

Uh oh!

gemini-code-assist Bot commented Mar 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 9, 2026

Uh oh!

yhyang201 commented Mar 10, 2026

Uh oh!

yhyang201 commented Mar 10, 2026

Uh oh!

yhyang201 commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

avjves commented Mar 9, 2026

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Performance Comparison Report

1. High-level Summary

2. Stage Breakdown

Output videos

Checklist

Review Process

Uh oh!

gemini-code-assist Bot commented Mar 9, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

yhyang201 commented Mar 10, 2026

Uh oh!

yhyang201 commented Mar 10, 2026

Uh oh!

yhyang201 commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants