[ROCm] Fix ADDMM hipBLASLt regression#138267
[ROCm] Fix ADDMM hipBLASLt regression#138267naromero77amd wants to merge 8 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138267
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit aec924b with merge base dbd6ada ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "topic: not user facing" |
|
❌ 🤖 pytorchbot command failed: |
|
@pytorchbot label "topic: not user facing" |
|
@pytorchbot merge -f "Lint + ROCM builds are fine" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot revert -c nosignal -m "this PR went to far when partially reverting #137604; the env var default should be the same on ROCm and CUDA" |
|
@pytorchbot successfully started a revert job. Check the current status here. |
This reverts commit 14a3e12. Reverted #138267 on behalf of https://github.com/jeffdaily due to this PR went to far when partially reverting #137604; the env var default should be the same on ROCm and CUDA ([comment](#138267 (comment)))
|
@naromero77amd your PR has been successfully reverted. |
This PR was reopened (likely due to being reverted), so your approval was removed. Please request another review.
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
cb023ce to
d2e7c17
Compare
Sorry, I mean checking-in an actual test to safeguard the same failure from happening in the future |
Thank you for bringing up testing. The right way to test this is with a gfx110x which we don't have in upstream CI yet but I have been told it is on the roadmap. I discussed with @jeffdaily and we do need to get rid of one existing test case as it is no longer appropriate. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Port the PR pytorch#138267 from upstream main to fix the error "RuntimeError: Attempting to use hipBLASLt on a unsupported architecture!" Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com>
Fixes #138067
A partial reversion of this PR: #137604
The breakage is on AMD GPUs that do not fully support hipBLASLt, e.g. gfx1100
cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang