[release/1.7.0] Added AITER as a submodule and use in fused_rope.py#226
Merged
amd-sriram merged 11 commits intorelease/1.7.0from Jul 9, 2025
Merged
[release/1.7.0] Added AITER as a submodule and use in fused_rope.py#226amd-sriram merged 11 commits intorelease/1.7.0from
amd-sriram merged 11 commits intorelease/1.7.0from
Conversation
…d rope test, reduced tolerances according to unit test in aiter repo.
…r backend if it is rocm and aiter is installed
…y error - ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
Collaborator
This PR should be tested with release/2.7 (both ROCm fork and upstream) to ensure it's compatible. |
Collaborator
Author
|
@jithunnair-amd I have tested with release/2.7 (rocm and upstream). I also updated the description. |
pruthvistony
approved these changes
Jul 8, 2025
|
Wait for PR - #222 to be merged before this. |
…nd use pip install -e . instead of python setup.py develop for installing aiter.
…nc and select apex or aiter subclass based on AITER_ROPE_BACKEND value. The user can specify the environment variable USE_ROCM_AITER_ROPE_BACKEND to select between aiter and apex backends for fused rope.
…est otherwise use the original precision 1e-3
remove spaces
jithunnair-amd
pushed a commit
to ROCm/pytorch
that referenced
this pull request
Jul 14, 2025
Fixing the C10_warpsize issue. replacing the macros with at::cuda::warp_size() - ROCm/apex#244 [[release/1.7.0] Added AITER as a submodule and use in fused_rope.py](ROCm/apex@53f3c64) - ROCm/apex#226 [Replaced warpsize with C10_WARP_SIZE](ROCm/apex@f417097) - ROCm/apex#253 [Disabling Aiter Installation in default build ](ROCm/apex@1c50337) - ROCm/apex#255 Fixes https://ontrack-internal.amd.com/browse/SWDEV-496182
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Added AITER support in fused_rope.py for all 4 variants. Updated fused rope test, reduced tolerances according to unit test in aiter repo.
Tested UT - python tests/L0/run_transformer/test_fused_rope.py
Added aiter as a submodule and build it in setup.py if it is rocm.
For rocm, it uses AITER backend
For cuda, it uses apex native kernels
Tested with rocm and upstream release/2.7
Fixes : https://ontrack-internal.amd.com/browse/SWDEV-496182