Fix warp size#256
Merged
amd-sriram merged 9 commits intomasterfrom Jul 15, 2025
Merged
Conversation
Collaborator
Author
|
! cherry-pick --onto release/1.4.0 release/1.5.0 release/1.6.0 release/1.7.0 |
okakarpa
pushed a commit
that referenced
this pull request
Jul 15, 2025
* replace c10_warp_size in fused rope * replace c10_warp_size in fused softmax * replace c10_warp_size in group batch norm * replace c10_warp_size in multiheadattention * replace c10_warp_size in tramsducer * replace c10_warp_size in xentropy * replace c10_warp_size in sync batch normalization * replace c10_warp_size in group batch norm * replace warp_size in multihead attention
okakarpa
pushed a commit
that referenced
this pull request
Jul 15, 2025
* replace c10_warp_size in fused rope * replace c10_warp_size in fused softmax * replace c10_warp_size in group batch norm * replace c10_warp_size in multiheadattention * replace c10_warp_size in tramsducer * replace c10_warp_size in xentropy * replace c10_warp_size in sync batch normalization * replace c10_warp_size in group batch norm * replace warp_size in multihead attention
okakarpa
pushed a commit
that referenced
this pull request
Jul 15, 2025
* replace c10_warp_size in fused rope * replace c10_warp_size in fused softmax * replace c10_warp_size in group batch norm * replace c10_warp_size in multiheadattention * replace c10_warp_size in tramsducer * replace c10_warp_size in xentropy * replace c10_warp_size in sync batch normalization * replace c10_warp_size in group batch norm * replace warp_size in multihead attention
Collaborator
|
Created branch autogenerated/release/1.4.0_cherry-pick_pr-256 and #257. It contains a merge conflict. Please resolve it Created branch autogenerated/release/1.5.0_cherry-pick_pr-256 and #258 Created branch autogenerated/release/1.6.0_cherry-pick_pr-256 and #259 Created branch autogenerated/release/1.7.0_cherry-pick_pr-256 and #260 |
amd-sriram
added a commit
that referenced
this pull request
Jul 15, 2025
* replace c10_warp_size in fused rope * replace c10_warp_size in fused softmax * replace c10_warp_size in group batch norm * replace c10_warp_size in multiheadattention * replace c10_warp_size in tramsducer * replace c10_warp_size in xentropy * replace c10_warp_size in sync batch normalization * replace c10_warp_size in group batch norm * replace warp_size in multihead attention Co-authored-by: Sriram Kumar <skishore@amd.com>
amd-sriram
added a commit
that referenced
this pull request
Jul 15, 2025
* replace c10_warp_size in fused rope * replace c10_warp_size in fused softmax * replace c10_warp_size in group batch norm * replace c10_warp_size in multiheadattention * replace c10_warp_size in tramsducer * replace c10_warp_size in xentropy * replace c10_warp_size in sync batch normalization * replace c10_warp_size in group batch norm * replace warp_size in multihead attention Co-authored-by: Sriram Kumar <skishore@amd.com>
amd-sriram
added a commit
that referenced
this pull request
Jul 15, 2025
* replace c10_warp_size in fused rope * replace c10_warp_size in fused softmax * replace c10_warp_size in group batch norm * replace c10_warp_size in multiheadattention * replace c10_warp_size in tramsducer * replace c10_warp_size in xentropy * replace c10_warp_size in sync batch normalization * replace c10_warp_size in group batch norm * replace warp_size in multihead attention Co-authored-by: Sriram Kumar <skishore@amd.com>
jithunnair-amd
pushed a commit
to ROCm/pytorch
that referenced
this pull request
Jul 22, 2025
Fix all warp size in apex- ROCm/apex@051cba7 ROCm/apex#256 Fixes : https://ontrack-internal.amd.com/browse/SWDEV-541725
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replace C10_WARP_SIZE, constant WARP_SIZE, constant THREADS_PER_WARP, warp.*32 regex uses with at::cuda::warp_size() when not used device or global.
Tested with docker
registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16420_ubuntu22.04_py3.10_pytorch_lw_rocm7.0_internal_testing_2d567672Affected extensions:
The following UTs pass:
Cherry-picked to release/1.4.0 branch via #257
Cherry-picked to release/1.5.0 branch via #258
Cherry-picked to release/1.6.0 branch via #259
Cherry-picked to release/1.7.0 branch via #260