Skip to content

[AUTOGENERATED] [rocm6.4_internal_testing] [ROCm] Improvements to non-vectorized elementwise kernels#1873

Closed
rocm-mici wants to merge 1 commit intorocm6.4_internal_testingfrom
rocm6.4_internal_testing_cherry-pick_pr-1872
Closed

[AUTOGENERATED] [rocm6.4_internal_testing] [ROCm] Improvements to non-vectorized elementwise kernels#1873
rocm-mici wants to merge 1 commit intorocm6.4_internal_testingfrom
rocm6.4_internal_testing_cherry-pick_pr-1872

Conversation

@rocm-mici
Copy link
Copy Markdown

Cherry-pick of #1872

* Unroll loops manually to hide memory access latency
* Strided access for coalesced memory acesses

Co-authors: @akadutta @doru1004 @amd-hhashemi @carlobertolli
@jerrymannil jerrymannil deleted the rocm6.4_internal_testing_cherry-pick_pr-1872 branch January 31, 2025 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants