[release/2.5] [ROCm] fastSpecializedAtomicAdd for MI300 (#135770)#1746
[release/2.5] [ROCm] fastSpecializedAtomicAdd for MI300 (#135770)#1746jithunnair-amd merged 1 commit intorelease/2.5from
Conversation
MI300 adds HW support for packed bfloat16 and fp16. Enable via existing fastSpecializedAtomicAdd. Pull Request resolved: pytorch#135770 Approved by: https://github.com/xw285cornell, https://github.com/jianyuh (cherry picked from commit d33a5e2)
90d5d10 to
0a25b38
Compare
|
cherry-pick -onto release/2.4 |
|
Created branch release/2.4_cherry-pick_pr-1746. It contains a merge conflict. Please resolve it |
|
cherry-pick --onto release/2.4 |
|
Created branch release/2.4_cherry-pick_pr-1746 and #1769. It contains a merge conflict. Please resolve it |
|
cherry-pick --onto release/2.4 |
|
Can't perform the cherry-pick keyword: unexpected error |
|
cherry-pick --onto release/2.4 |
|
Can't perform the cherry-pick keyword: unexpected error |
|
cherry-pick --onto release/2.4 |
|
Created branch release/2.4_cherry-pick_pr-1746 and null. It contains a merge conflict. Please resolve it |
|
cherry-pick --onto release/2.4 |
|
Created branch release/2.4_cherry-pick_pr-1746 and null. It contains a merge conflict. Please resolve it |
|
cherry-pick --onto release/2.4 |
|
Can't perform the cherry-pick keyword: unexpected error |
|
cherry-pick --onto release/2.4 |
|
Can't perform the cherry-pick keyword: unexpected error |
|
cherry-pick --onto release/2.4 |
|
Created branch release/2.4_cherry-pick_pr-1746 and #1774. It contains a merge conflict. Please resolve it |
MI300 adds HW support for packed bfloat16 and fp16. Enable via existing fastSpecializedAtomicAdd.
Helps with improving torch.scatter_add_ performance, among others.
Pull Request resolved: pytorch#135770