Skip to content

[release/2.5] [ROCm] Improvements to non-vectorized elementwise kernels#1872

Merged
pruthvistony merged 1 commit intoROCm:release/2.5from
jerrymannil:release/2.5
Jan 31, 2025
Merged

[release/2.5] [ROCm] Improvements to non-vectorized elementwise kernels#1872
pruthvistony merged 1 commit intoROCm:release/2.5from
jerrymannil:release/2.5

Conversation

@jerrymannil
Copy link
Copy Markdown
Collaborator

  • Unroll loops manually to hide memory access latency
  • Strided access for coalesced memory acesses

Co-authors: @akadutta @doru1004 @amd-hhashemi @carlobertolli

@jerrymannil
Copy link
Copy Markdown
Collaborator Author

Upstream CI is passing. See pytorch#145635

@rocm-repo-management-api
Copy link
Copy Markdown

Jenkins build for 612a6bb608d6a9a747cc4ef38122950755d58810 commit is in progress
Links: Blue Ocean view / Build artifacts

@pruthvistony pruthvistony merged commit 6a28181 into ROCm:release/2.5 Jan 31, 2025
@jerrymannil
Copy link
Copy Markdown
Collaborator Author

!cherry-pick --onto rocm6.4_internal_testing

rocm-mici pushed a commit that referenced this pull request Jan 31, 2025
* Unroll loops manually to hide memory access latency
* Strided access for coalesced memory acesses

Co-authors: @akadutta @doru1004 @amd-hhashemi @carlobertolli
@rocm-mici
Copy link
Copy Markdown

Created branch rocm6.4_internal_testing_cherry-pick_pr-1872 and #1873

@jerrymannil
Copy link
Copy Markdown
Collaborator Author

!cherry-pick --onto release/2.5

@rocm-mici
Copy link
Copy Markdown

Nothing to cherry-pick onto the release/2.5 branch

@BLOrange-AMD BLOrange-AMD changed the title [ROCm] Improvements to non-vectorized elementwise kernels [release/2.5] [ROCm] Improvements to non-vectorized elementwise kernels Feb 7, 2025
BLOrange-AMD added a commit that referenced this pull request Mar 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants