[diffusion] [AMD] model: allow AITER backends in Flux 2 pipeline by avjves · Pull Request #22802 · sgl-project/sglang

avjves · 2026-04-14T12:38:33Z

Motivation

This PR is related to #22690.
PR #22423 specified supported backends for Flux2, but this only included SDPA and FA, not AITER. This is a big performance regression for AMD devices.

Modifications

Adds AITER and AITER_SAGE as supported backends for Flux2

Accuracy Tests

With SDPA as backend:

With AITER as backend:

Run command for both (First ran without fixes in this PR and latter with the fixes):

sglang generate --model-path black-forest-labs/FLUX.2-dev --height 1024 --width 1024 --ulysses-degree 8 --ring-degree 1 --num-gpus 8 --num-inference-steps 50 --guidance-scale 4.0 --prompt "Add a cool hat to the cat" --dit-cpu-offload False --dit-layerwise-offload False --text-encoder-cpu-offload False --image-encoder-cpu-offload False --vae-cpu-offload False --warmup True --warmup-steps 2 --vae-precision bf16 --output-path /outputs/flux2.default --seed 42 --image-path /app/data/flux_cat.png --enable-torch-compile

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist

Code Review

This pull request adds AITER and AITER_SAGE to the supported attention backends for the Flux2Transformer2DModel. However, feedback indicates that enabling these backends is premature due to critical integration issues, including a tensor layout mismatch in the AITER implementation, missing entries in the Ring Attention whitelist, and signature incompatibilities in the AITER_SAGE forward method.

gemini-code-assist · 2026-04-14T12:40:14Z

+        AttentionBackendEnum.AITER,
+        AttentionBackendEnum.AITER_SAGE,


Enabling these backends for Flux2 is premature due to several integration issues in the underlying attention infrastructure:

AITER Layout Mismatch (High Severity): The AITerImpl.forward implementation in aiter.py expects a [batch_size, num_heads, seq_len, head_dim] layout (as explicitly stated in its docstring and comments). However, USPAttention (which Flux2 uses) provides tensors in [batch_size, seq_len, num_heads, head_dim] format for both local and replicated-prefix paths. This will result in incorrect attention computation.

Ring Attention Whitelist (Medium Severity): Both AITER and AITER_SAGE are currently missing from the Ring Attention whitelist in python/sglang/multimodal_gen/runtime/layers/attention/layer.py. Any attempt to use Flux2 with these backends and Ring Attention enabled will trigger a RuntimeError in USPAttention.__init__.

AITER_SAGE Signature (Medium Severity): The AITERSageImpl.forward method does not accept **kwargs. If the Ring Attention whitelist is updated, ring_attn might still fail if it attempts to pass additional parameters (like dropout_p or is_causal) to the implementation.

Please ensure the backends in aiter.py, aiter_sage.py, and the whitelist in layer.py are updated to support the Flux2 pipeline requirements before enabling them here.

They are already supported. It's just Flux dit that disabled it. Tested running with AITER just fine.

Hmm, @avjves please provide some insight here? Also it is good to provide example runs with aiter, aitger_sage backend in PR body.

Sorry, my response here to Gemini wasn't maybe as clear as it should be.

The AITER implementation has a wrong docstring. It says "[batch_size, seq_len, head_num, head_dim]", but that would crash whenever AITER is used anywhere in the codebase. Seq_len and head_num dimensions are in reality swapped. As such, there is no layout mismatch here. I can change the docstring as well, though it's not really the scope of this PR.

A check for backends for ring-attn was added in [diffusion] Validate attention backend for Ring Attention in USPAttention #21828 , but it doesn't include AITER. It should be added there as well, but it's again not in the scope of this PR, as that's not a Flux2 issue per se. I try to avoid changing too many "unrelated" issues in a PR afterwards to keep the history clear. I do see this adding review overhead, so maybe it is better to bundle all together and change the scope of the PR afterwards.

This is a newer change, as most attention backends do not take in extra arguments. This should be looked at when fixing 2).

Sure, I'll add some example outputs as well :)

Added the output images as well :)

Fixed the docstring, but I think 2) and 3) warrant their own PR.

HaiShaw

@avjves would you please address review comment?

bingxche · 2026-04-16T08:23:26Z

@amd-bot ci-status

amd-bot · 2026-04-16T08:26:59Z

@bingxche

CI Status for PR #22802

PR: [diffusion] [AMD] model: allow AITER backends in Flux 2 pipeline
Changed files: python/sglang/multimodal_gen/runtime/models/dits/flux_2.py (+2/-0)

AMD: 2 failures (0 likely related) | Others: 1 failure (0 related)

AMD CI Failures

Job	Test File	Test Function	Error	Related?	Explanation	Log
multimodal-gen-test-1-gpu-amd (partition 1)	`sglang/multimodal_gen/test/server/test_server_b.py`	`test_diffusion_generation[fastwan2_2_ti2v_5b]`	`GPU Hang` (exit code 134)	🟢 Unlikely	GPU hardware hang during FastWan2.2-TI2V-5B video generation; unrelated model/codepath to flux_2.py	Log
multimodal-gen-test-2-gpu-amd (partition 0)	`sglang/multimodal_gen/test/server/test_server_2_gpu_b.py`	`test_diffusion_generation[flux_2_image_t2i_2_gpus]`	`RuntimeError: No available kernel` in mistral_3.py:138	🟢 Unlikely	Mistral 3 text encoder SDPA kernel issue on ROCm — fixed by PR #22690 (merged Apr 16, after this CI ran Apr 14)	Log
multimodal-gen-test-2-gpu-amd (partition 0)	`sglang/multimodal_gen/test/server/test_server_2_gpu_b.py`	`test_diffusion_generation[zimage_image_t2i_2_gpus]`	`HfHubHTTPError: 504 Gateway Time-out` (Z-Image-Turbo download)	🟢 Unlikely	HuggingFace Hub 504 timeout — infra issue	Log
multimodal-gen-test-2-gpu-amd (partition 0)	`sglang/multimodal_gen/test/server/test_server_2_gpu_b.py`	`test_diffusion_generation[flux_image_t2i_2_gpus]`	`HfHubHTTPError: 504 Gateway Time-out` (FLUX.1-dev download)	🟢 Unlikely	HuggingFace Hub 504 timeout — infra issue	Log

Other CI Failures

Job	Test File	Test Function	Error	Related?	Explanation	Log
pr-test-finish	N/A	N/A	Gate job: upstream `call-gate` reported failure	🟢 Unlikely	Gate aggregator — fails because AMD multimodal-gen jobs failed	Log

Details

All failures are unrelated to this PR's change (adding AITER and AITER_SAGE to _supported_attention_backends in Flux2Transformer2DModel):

1-GPU GPU Hang (fastwan2_2_ti2v_5b): The process crashed with HW Exception by GPU node-2 reason: GPU Hang during video generation with FastVideo/FastWan2.2-TI2V-5B-FullAttn-Diffusers. This is a hardware/infrastructure issue on the MI325X runner, completely unrelated to the Flux 2 model code. 9 of 10 tests in this partition passed before the hang.
2-GPU Mistral 3 SDPA failure (flux_2_image_t2i_2_gpus): The FLUX.2-dev model loaded successfully, but warmup failed with RuntimeError: No available kernel. Aborting execution. in mistral_3.py:138. On ROCm, SDPBackend.CUDNN_ATTENTION was being incorrectly applied because the platform check only looked at device.type == "cuda" (which is true for ROCm/HIP). This is a known pre-existing bug fixed by PR [diffusion] model: Properly validate device for Mistral 3 attention #22690 (merged Apr 16) — this CI ran on Apr 14, before that fix landed.
2-GPU HuggingFace timeouts (zimage_image_t2i_2_gpus, flux_image_t2i_2_gpus): Both tests failed because HuggingFace Hub returned 504 Gateway Time-out when trying to download model files. Pure infrastructure flake.

Verdict: No action needed from the PR author. All failures are pre-existing bugs (now fixed) or infrastructure issues. Re-running CI after rebasing on latest main (which includes #22690) should resolve the `flux_2_image_t2i_2_gpus` failure.

Generated by amd-bot using Claude Code CLI

HaiShaw · 2026-04-19T18:17:04Z

/tag-and-rerun-ci

avjves · 2026-04-22T08:29:38Z

@HaiShaw Is everything now OK with this? Would be great to get if merged ASAP.

HaiShaw · 2026-04-22T09:18:06Z

@HaiShaw Is everything now OK with this? Would be great to get if merged ASAP.

Can you provide a full run example with command (which is useful for others to begin with)?

avjves · 2026-04-22T09:59:08Z

@HaiShaw Is everything now OK with this? Would be great to get if merged ASAP.

Can you provide a full run example with command (which is useful for others to begin with)?

Sure, added in the PR description. Posting here as well:

sglang generate --model-path black-forest-labs/FLUX.2-dev 
                           --height 1024 --width 1024 --ulysses-degree 8 --ring-degree 1 --num-gpus 8
                           --num-inference-steps 50 --guidance-scale 4.0 --prompt "Add a cool hat to the cat"
                           --dit-cpu-offload False --dit-layerwise-offload False --text-encoder-cpu-offload False
                           --image-encoder-cpu-offload False --vae-cpu-offload False --warmup True --warmup-steps 2
                           --vae-precision bf16 --output-path /outputs/flux2.default --seed 42
                           --image-path /app/data/flux_cat.png --enable-torch-compile

The images shown in the description are ran with this command.

…-project#22802)

[diffusion] allow AITER backends in Flux 2

430c2db

avjves requested review from BBuf, mickqian, ping1jing2, yhyang201 and yingluosanqian as code owners April 14, 2026 12:38

github-actions Bot added the diffusion SGLang Diffusion label Apr 14, 2026

avjves changed the title ~~[diffusion] allow AITER backends in Flux 2 pipeline~~ [diffusion] model: allow AITER backends in Flux 2 pipeline Apr 14, 2026

avjves changed the title ~~[diffusion] model: allow AITER backends in Flux 2 pipeline~~ [diffusion] [AMD] model: allow AITER backends in Flux 2 pipeline Apr 14, 2026

gemini-code-assist Bot reviewed Apr 14, 2026

View reviewed changes

mickqian approved these changes Apr 16, 2026

View reviewed changes

HaiShaw reviewed Apr 16, 2026

View reviewed changes

Merge branch 'main' into bug/flux_aiter

5289539

github-actions Bot added the run-ci label Apr 19, 2026

avjves added 2 commits April 20, 2026 15:06

Merge branch 'main' into bug/flux_aiter

12b822b

Update AITER Docstring

aa18f76

HaiShaw merged commit ac351c1 into sgl-project:main Apr 22, 2026
65 of 69 checks passed

zhangying098 pushed a commit to zhangying098/sglang that referenced this pull request Apr 23, 2026

[diffusion] [AMD] model: allow AITER backends in Flux 2 pipeline (sgl…

e2af344

…-project#22802)

Conversation

avjves commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

avjves Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

HaiShaw Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

avjves Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

avjves Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

avjves Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

HaiShaw left a comment

Choose a reason for hiding this comment

Uh oh!

bingxche commented Apr 16, 2026

Uh oh!

amd-bot commented Apr 16, 2026

CI Status for PR #22802

AMD CI Failures

Other CI Failures

Details

Verdict: No action needed from the PR author. All failures are pre-existing bugs (now fixed) or infrastructure issues. Re-running CI after rebasing on latest main (which includes #22690) should resolve the flux_2_image_t2i_2_gpus failure.

Uh oh!

HaiShaw commented Apr 19, 2026

Uh oh!

avjves commented Apr 22, 2026

Uh oh!

HaiShaw commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avjves commented Apr 22, 2026 • edited by HaiShaw Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

avjves commented Apr 14, 2026 •

edited

Loading

Verdict: No action needed from the PR author. All failures are pre-existing bugs (now fixed) or infrastructure issues. Re-running CI after rebasing on latest main (which includes #22690) should resolve the `flux_2_image_t2i_2_gpus` failure.

HaiShaw commented Apr 22, 2026 •

edited

Loading

avjves commented Apr 22, 2026 •

edited by HaiShaw

Loading