Fix warp size by amd-sriram · Pull Request #256 · ROCm/apex

amd-sriram · 2025-07-15T10:52:23Z

Replace C10_WARP_SIZE, constant WARP_SIZE, constant THREADS_PER_WARP, warp.*32 regex uses with at::cuda::warp_size() when not used device or global.

Tested with docker
registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16420_ubuntu22.04_py3.10_pytorch_lw_rocm7.0_internal_testing_2d567672

Affected extensions:

fused rope - csrc/megatron/fused_rotary_positional_embedding.h
scaled_masked_softmax_cuda - , csrc/megatron/scaled_masked_softmax.h
generic_scaled_masked_softmax_cuda - csrc/megatron/generic_scaled_masked_softmax.h
scaled_upper_triang_masked_softmax_cuda - csrc/megatron/scaled_upper_triang_masked_softmax.h
group batch norm - apex/contrib/csrc/groupbn/batch_norm.h, apex/contrib/csrc/groupbn/batch_norm_add_relu.h
multihead attention - apex/contrib/csrc/multihead_attn/softmax.cuh
transducer - apex/contrib/csrc/transducer/transducer_joint_kernel.cu
xentropy - apex/contrib/csrc/xentropy/xentropy_kernel.cu
sync batch norm - csrc/welford.cu

The following UTs pass:

tests/L0/run_transformer/test_fused_rope.py
tests/L0/run_transformer/test_fused_softmax.py
apex/contrib/test/groupbn/test_groupbn.py
apex/contrib/test/groupbn/test_groupbn_channel_last.py
apex/contrib/test/multihead_attn/test_mha_fused_softmax.py
apex/contrib/test/transducer/test_transducer_joint.py
apex/contrib/test/transducer/test_transducer_loss.py
apex/contrib/test/test_label_smoothing.py
tests/distributed/synced_batchnorm/python_single_gpu_unit_test.py
tests/distributed/synced_batchnorm/single_gpu_unit_test.py
tests/distributed/synced_batchnorm/test_batchnorm1d.py

Cherry-picked to release/1.4.0 branch via #257

Cherry-picked to release/1.5.0 branch via #258

Cherry-picked to release/1.6.0 branch via #259

Cherry-picked to release/1.7.0 branch via #260

amd-sriram · 2025-07-15T15:56:50Z

! cherry-pick --onto release/1.4.0 release/1.5.0 release/1.6.0 release/1.7.0

* replace c10_warp_size in fused rope * replace c10_warp_size in fused softmax * replace c10_warp_size in group batch norm * replace c10_warp_size in multiheadattention * replace c10_warp_size in tramsducer * replace c10_warp_size in xentropy * replace c10_warp_size in sync batch normalization * replace c10_warp_size in group batch norm * replace warp_size in multihead attention

okakarpa · 2025-07-15T16:00:30Z

Created branch autogenerated/release/1.4.0_cherry-pick_pr-256 and #257. It contains a merge conflict. Please resolve it

Created branch autogenerated/release/1.5.0_cherry-pick_pr-256 and #258

Created branch autogenerated/release/1.6.0_cherry-pick_pr-256 and #259

Created branch autogenerated/release/1.7.0_cherry-pick_pr-256 and #260

* replace c10_warp_size in fused rope * replace c10_warp_size in fused softmax * replace c10_warp_size in group batch norm * replace c10_warp_size in multiheadattention * replace c10_warp_size in tramsducer * replace c10_warp_size in xentropy * replace c10_warp_size in sync batch normalization * replace c10_warp_size in group batch norm * replace warp_size in multihead attention Co-authored-by: Sriram Kumar <skishore@amd.com>

Fix all warp size in apex- ROCm/apex@051cba7 ROCm/apex#256 Fixes : https://ontrack-internal.amd.com/browse/SWDEV-541725

amd-sriram added 2 commits July 15, 2025 10:03

replace c10_warp_size in fused rope

300a38c

replace c10_warp_size in fused softmax

8bdef6f

amd-sriram requested review from jithunnair-amd and pruthvistony July 15, 2025 10:52

amd-sriram self-assigned this Jul 15, 2025

amd-sriram added 7 commits July 15, 2025 11:13

replace c10_warp_size in group batch norm

3a31fd9

replace c10_warp_size in multiheadattention

912011e

replace c10_warp_size in tramsducer

a235bd8

replace c10_warp_size in xentropy

ed92ca0

replace c10_warp_size in sync batch normalization

34bfe30

replace c10_warp_size in group batch norm

a1484c0

replace warp_size in multihead attention

9d1e60f

amd-sriram merged commit 051cba7 into master Jul 15, 2025

amd-sriram deleted the fix_warp_size branch July 15, 2025 15:37

okakarpa mentioned this pull request Jul 15, 2025

[AUTOGENERATED] [release/1.4.0] Fix warp size #257

Merged

okakarpa mentioned this pull request Jul 15, 2025

[AUTOGENERATED] [release/1.5.0] Fix warp size #258

Merged

okakarpa mentioned this pull request Jul 15, 2025

[AUTOGENERATED] [release/1.6.0] Fix warp size #259

Merged

okakarpa mentioned this pull request Jul 15, 2025

[AUTOGENERATED] [release/1.7.0] Fix warp size #260

Merged

amd-sriram mentioned this pull request Jul 15, 2025

[rocm7.0_internal_testing] Update related_commits ROCm/pytorch#2371

Merged

jithunnair-amd pushed a commit to ROCm/pytorch that referenced this pull request Jul 22, 2025

[rocm7.0_internal_testing] Update related_commits (#2371)

622c1f8

Fix all warp size in apex- ROCm/apex@051cba7 ROCm/apex#256 Fixes : https://ontrack-internal.amd.com/browse/SWDEV-541725

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix warp size#256

Fix warp size#256
amd-sriram merged 9 commits intomasterfrom
fix_warp_size

amd-sriram commented Jul 15, 2025 •

edited by okakarpa

Loading

Uh oh!

amd-sriram commented Jul 15, 2025

Uh oh!

okakarpa commented Jul 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

amd-sriram commented Jul 15, 2025 • edited by okakarpa Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amd-sriram commented Jul 15, 2025

Uh oh!

okakarpa commented Jul 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amd-sriram commented Jul 15, 2025 •

edited by okakarpa

Loading