[ROCm] fastSpecializedAtomicAdd for MI300#135770
[ROCm] fastSpecializedAtomicAdd for MI300#135770jeffdaily wants to merge 8 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135770
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit 6df3cf1 with merge base 31c0467 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@jianyuh has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
Hi Jeff, I tested it internally on ROCm 6.2.0 and the performance looks great—thanks! However, I noticed that the code specifies ROCM_VERSION >= 60201. Is this a requirement, or should it also work with 6.2.0? |
|
@Mellonta it's very possible that our internal clang compiler is newer than the clang rpm in 6.2.0 |
When working on this PR I discovered a bug in our ROCm 6.2 compiler. To better support you, I got the release team to push the fix as a patch in ROCm 6.2.1. The compilation will succeed on ROCm 6.2, but your results when using index_add will be garbage for bf16 and fp16 types. That's why I guard it as needing 6.2.1 or newer. |
|
@xw285cornell has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
xw285cornell
left a comment
There was a problem hiding this comment.
The failed tests doesn't seem relevant
|
@jianyuh @xw285cornell Build should be fixed now. |
|
@xw285cornell has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@pytorchbot merge -f 'Landed internally' (Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally) |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
MI300 adds HW support for packed bfloat16 and fp16. Enable via existing fastSpecializedAtomicAdd. Pull Request resolved: pytorch#135770 Approved by: https://github.com/xw285cornell, https://github.com/jianyuh
MI300 adds HW support for packed bfloat16 and fp16. Enable via existing fastSpecializedAtomicAdd. Pull Request resolved: pytorch#135770 Approved by: https://github.com/xw285cornell, https://github.com/jianyuh
MI300 adds HW support for packed bfloat16 and fp16. Enable via existing fastSpecializedAtomicAdd. Pull Request resolved: pytorch#135770 Approved by: https://github.com/xw285cornell, https://github.com/jianyuh Co-authored-by: Jeff Daily <jeff.daily@amd.com>
MI300 adds HW support for packed bfloat16 and fp16. Enable via existing fastSpecializedAtomicAdd. Pull Request resolved: pytorch#135770 Approved by: https://github.com/xw285cornell, https://github.com/jianyuh
MI300 adds HW support for packed bfloat16 and fp16. Enable via existing fastSpecializedAtomicAdd. Pull Request resolved: pytorch#135770 Approved by: https://github.com/xw285cornell, https://github.com/jianyuh (cherry picked from commit d33a5e2)
…) (#1746) MI300 adds HW support for packed bfloat16 and fp16. Enable via existing fastSpecializedAtomicAdd. Helps with improving [torch.scatter_add_ performance](https://ontrack-internal.amd.com/browse/SWDEV-497013), among others. Pull Request resolved: pytorch#135770 Co-authored-by: Jeff Daily <jeff.daily@amd.com>
MI300 adds HW support for packed bfloat16 and fp16. Enable via existing fastSpecializedAtomicAdd. Pull Request resolved: pytorch#135770 Approved by: https://github.com/xw285cornell, https://github.com/jianyuh Co-authored-by: Jeff Daily <jeff.daily@amd.com>
MI300 adds HW support for packed bfloat16 and fp16. Enable via existing fastSpecializedAtomicAdd.
cc @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd