[AMD] Diffusion Support on ROCm#13760
Conversation
Summary of ChangesHello @zyzshishui, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands Sglang Diffusion's compatibility by integrating full support for AMD's ROCm platform. The changes enable efficient execution of diffusion models on AMD GPUs, primarily by adopting AITer as the default attention backend and implementing robust data type handling. It also streamlines the development environment setup with a new ROCm-specific Dockerfile and improves stability by localizing external dependencies and preventing profiler data conflicts. The overall impact is a more versatile and performant Sglang Diffusion for a broader range of hardware. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces ROCm support for SGLang Diffusion, a significant step towards broader hardware compatibility. The changes include a new Dockerfile for ROCm, the integration of the AITer attention backend, and various code modifications to ensure compatibility and remove problematic dependencies on ROCm. My review has identified a couple of critical issues—one in the Dockerfile that would cause build failures and another in the AITer backend implementation that could lead to runtime errors. I have also provided suggestions to improve Dockerfile efficiency and documentation clarity. Overall, this is a valuable contribution that enables SGLang Diffusion on AMD hardware.
1a9891b to
586b79b
Compare
|
/tag-and-rerun-ci 11/26 |
|
Automatic Data Type Casting for AITer: I suggest falling back to SDPA instead of AITER in CLIP or other model except DIT part to avoid some image incorrectness. |
|
You are the GOAT! |
e3be8e3 to
a4e74e7
Compare
81e4bf1 to
5b4d240
Compare
|
/rerun-failed-ci |
Co-authored-by: Sabre Shao <sabre.shao@amd.com> Co-authored-by: Yusheng (Ethan) Su <yushengsu.thu@gmail.com> Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>
Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>
- Fix GPU OOM in sequential tests on ROCm/AMD with explicit memory cleanup - Skip Ring Attention tests on AMD/ROCm (unsupported) - Fix SGLANG_TEST_OUTPUT_SIZE not applied to actual test requests - Add MIOpen kernel caching for AMD VAE performance - Add diagnostics for HF cache and system resources - Add disk cleanup for non-persistent HF cache between tests - Enable all diffusion tests including LoRA (except FLUX.2 on 1-GPU)
The Docker image contains pre-compiled AITER kernels at /sgl-workspace/aiter/aiter/jit/ which may be incompatible. Clear them before running tests to force fresh JIT compilation.
| if self.height is None: | ||
| self.height_not_provided = True | ||
|
|
||
| # Allow env var to override num_inference_steps (for faster CI testing on AMD) |
There was a problem hiding this comment.
please fix it in a follow-up PR, this can be passed via sampling params
Could you please provide more detail about this, under which circumstance would this cause an issue? Thanks! |
Motivation
Support Sglang Diffusion on ROCm.
A more fine-grained follow-up to #13492. Thanks @sabreshao for the original support!
Also manually merged #13743.
Modifications
Make AITer the default attention backend on ROCm diffusion.
Avoid profiler trace overwrites by suffixing rank IDs.
Remove dependency of installing yunchang, put relevant function locally (as
torch not found errorwould arise when installing yunchang on ROCm) to support sp_degree/ulysses_degree on ROCm.Bump openai version to 2.6.1 for all non-default pyproject to solve bugs caused by api change.
Fixed some ci.
Note
Accuracy Tests
Tested on MI300 GPUs.
Benchmarking and Profiling
Checklist