[ROCm] [BUGFIX] Re-enable rocm-specific tuning parameters v2 (#133852)#136139
[ROCm] [BUGFIX] Re-enable rocm-specific tuning parameters v2 (#133852)#136139kit1980 merged 1 commit intopytorch:release/2.5from
Conversation
…#133852) Small bug fix - pytorch#124592 replaced the torch.version.hip with device_props but made a mistake in porting the original logic. The original code was: `if torch.version.hip is not None:` Which was incorrectly replaced by: `if self.device_props.type != "hip":` Another occurence of pytorch#130617 Pull Request resolved: pytorch#133852 Approved by: https://github.com/masnesral, https://github.com/malfet (cherry picked from commit da587de)
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136139
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 3 Unrelated FailuresAs of commit 683c494 with merge base b7eb725 ( NEW FAILURE - The following job has failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@jataylo I see rocm failures on this PR, related? |
There was a problem hiding this comment.
Hi @jataylo these errors look not really related, however please you can confirm this ? :
[rocm / linux-focal-rocm6.1-py3.8 / test (default, 1, 6, linux.rocm.gpu.2)](https://hud.pytorch.org/pr/pytorch/pytorch/136139#30199911968) ([gh](https://github.com/pytorch/pytorch/actions/runs/10883861799/job/30199911968)) ([trunk failure](https://hud.pytorch.org/pytorch/pytorch/commit/b7eb7256fb9a48d1fc452608986b64688b6469fa#29886643882))
inductor/test_flex_decoding.py::TestFlexDecoding::test_builtin_score_mods_bfloat16_score_mod0_head_dims1
[rocm / linux-focal-rocm6.1-py3.8 / test (default, 3, 6, linux.rocm.gpu.2)](https://hud.pytorch.org/pr/pytorch/pytorch/136139#30199912800) ([gh](https://github.com/pytorch/pytorch/actions/runs/10883861799/job/30199912800)) ([trunk failure](https://hud.pytorch.org/pytorch/pytorch/commit/b7eb7256fb9a48d1fc452608986b64688b6469fa#29886644164))
inductor/test_loop_ordering.py::LoopOrderingTest::test_fp8_cast_and_t
[rocm / linux-focal-rocm6.1-py3.8 / test (default, 4, 6, linux.rocm.gpu.2)](https://hud.pytorch.org/pr/pytorch/pytorch/136139#30199913164) ([gh](https://github.com/pytorch/pytorch/actions/runs/10883861799/job/30199913164)) ([trunk failure](https://hud.pytorch.org/pytorch/pytorch/commit/b7eb7256fb9a48d1fc452608986b64688b6469fa#29886645016))
inductor/test_flex_decoding.py::TestFlexDecoding::test_builtin_score_mods_bfloat16_score_mod0_head_dims0
|
I can confirm these failures are unrelated https://github.com/pytorch/pytorch/pull/136557/files/16ebb15a8d8de4200fddd5c7b7cb8143a834994c..8f94eaaf3da2977e90aef4df9816d0c88fc74da8 This cherry pick will resolve the fp8 failures. The flex-decode failures I'm not sure what the root cause was to resolve these cc: @amdfaa @jithunnair-amd @jerrymannil but they are not related to this change. |
|
Verified in torch2.5 final RC wheel - |
Small bug fix - #124592 replaced the torch.version.hip with device_props but made a mistake in porting the original logic.
The original code was:
if torch.version.hip is not None:Which was incorrectly replaced by:
if self.device_props.type != "hip":Another occurence of #130617
Pull Request resolved: #133852
Approved by: https://github.com/masnesral, https://github.com/malfet
(cherry picked from commit da587de)
cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @hongxiayang @naromero77amd @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang