Skip to content

[release/2.4] [ROCm] fastSpecializedAtomicAdd for MI300 (#135770)#1677

Merged
pruthvistony merged 1 commit intoROCm:release/2.4from
jerrymannil:release/2.4
Nov 5, 2024
Merged

[release/2.4] [ROCm] fastSpecializedAtomicAdd for MI300 (#135770)#1677
pruthvistony merged 1 commit intoROCm:release/2.4from
jerrymannil:release/2.4

Conversation

@jerrymannil
Copy link
Copy Markdown
Collaborator

@jerrymannil jerrymannil commented Nov 5, 2024

MI300 adds HW support for packed bfloat16 and fp16. Enable via existing fastSpecializedAtomicAdd.

Helps with improving torch.scatter_add_ performance, among others.

Pull Request resolved: pytorch#135770
Approved by: https://github.com/xw285cornell, https://github.com/jianyuh

MI300 adds HW support for packed bfloat16 and fp16. Enable via existing fastSpecializedAtomicAdd.

Pull Request resolved: pytorch#135770
Approved by: https://github.com/xw285cornell, https://github.com/jianyuh
@pruthvistony pruthvistony merged commit 619b266 into ROCm:release/2.4 Nov 5, 2024
@jithunnair-amd jithunnair-amd changed the title [ROCm] fastSpecializedAtomicAdd for MI300 (#135770) [release/2.4] [ROCm] fastSpecializedAtomicAdd for MI300 (#135770) Nov 23, 2024
jithunnair-amd pushed a commit that referenced this pull request Mar 17, 2025
MI300 adds HW support for packed bfloat16 and fp16. Enable via existing
fastSpecializedAtomicAdd.

Pull Request resolved: pytorch#135770
Approved by: https://github.com/xw285cornell, https://github.com/jianyuh

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants