[TRITON] Add FP8 support for gfx1200/gfx1201#2621
Merged
brunomazzottiamd merged 1 commit intoROCm:mainfrom Apr 9, 2026
Merged
Conversation
Contributor
🏷️ CI GuideRuns automatically on every PR:
Extended tests (opt-in via labels):
|
c8f9e9d to
2b4a6ef
Compare
Contributor
brunomazzottiamd
left a comment
There was a problem hiding this comment.
LGTM! Let's wait for a thumbs up from @micmelesse before merging.
Contributor
|
Can you please rebase on top of latest |
2b4a6ef to
61711e4
Compare
Contributor
Author
Done. Thanks for the review! |
437d7fd to
b5e7be5
Compare
b5e7be5 to
e4296d3
Compare
micmelesse
approved these changes
Apr 9, 2026
Contributor
micmelesse
left a comment
There was a problem hiding this comment.
This looks good to me. Let us run it through ci and if everything is green. We can merge it.
brunomazzottiamd
approved these changes
Apr 9, 2026
2 tasks
sunway513
pushed a commit
that referenced
this pull request
Apr 21, 2026
ClementLinCF
pushed a commit
that referenced
this pull request
Apr 25, 2026
Liang-jianhao97
pushed a commit
that referenced
this pull request
Apr 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
RDNA4 GPUs (
gfx1200,gfx1201) support native FP8 operations but are not recognized as FP8-capable in aiter, causing aRuntimeError: gfx1200 does not support FP8when attempting to use FP8 with Flash Attention 3.Technical Details
RDNA4 uses the standard IEEE/OCP
float8_e4m3fnandfloat8_e5m2formats, identical togfx950(MI350X), not the FNUZ variants used bygfx942(MI300X). No dtype replacement mapping is needed since the hardware natively supports the standard formats.Three files changed:
aiter/ops/triton/utils/_triton/arch_info.py: addgfx1200/gfx1201tois_fp8_avail()aiter/ops/triton/utils/types.py: addgfx1200/gfx1201toget_fp8_dtypes()andget_fp8_e4m3_dtype()aiter/ops/triton/_triton_kernels/flash_attn_triton_amd/utils.py: addgfx1200/gfx1201toFP8_ARCHSTest Plan
Tested on AMD Radeon RX 9060 XT (
gfx1200) running TheRock ROCm 7.13 on Windows via the FA3 interface:Test Result
Although FA3 is not officially targeted at RDNA architectures, it can be successfully built. Benchmarking on RDNA4 indicates that its performance is roughly on par with FA2.
Submission Checklist