Skip to content

[ROCm] [BUGFIX] Re-enable rocm-specific tuning parameters v2 (#133852)#136139

Merged
kit1980 merged 1 commit intopytorch:release/2.5from
jataylo:rel25-rocm-param-fix
Sep 25, 2024
Merged

[ROCm] [BUGFIX] Re-enable rocm-specific tuning parameters v2 (#133852)#136139
kit1980 merged 1 commit intopytorch:release/2.5from
jataylo:rel25-rocm-param-fix

Conversation

@jataylo
Copy link
Collaborator

@jataylo jataylo commented Sep 16, 2024

Small bug fix - #124592 replaced the torch.version.hip with device_props but made a mistake in porting the original logic.

The original code was:
if torch.version.hip is not None:

Which was incorrectly replaced by:
if self.device_props.type != "hip":

Another occurence of #130617

Pull Request resolved: #133852
Approved by: https://github.com/masnesral, https://github.com/malfet

(cherry picked from commit da587de)

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @hongxiayang @naromero77amd @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

…#133852)

Small bug fix - pytorch#124592 replaced the torch.version.hip with device_props but made a mistake in porting the original logic.

The original code was:
`if torch.version.hip is not None:`

Which was incorrectly replaced by:
`if self.device_props.type != "hip":`

Another occurence of pytorch#130617

Pull Request resolved: pytorch#133852
Approved by: https://github.com/masnesral, https://github.com/malfet

(cherry picked from commit da587de)
@jataylo jataylo requested review from atalman and malfet September 16, 2024 12:32
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 16, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136139

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 3 Unrelated Failures

As of commit 683c494 with merge base b7eb725 (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/inductor ciflow/rocm Trigger "default" config CI on ROCm module: inductor module: rocm AMD GPU support for Pytorch labels Sep 16, 2024
@kit1980
Copy link
Contributor

kit1980 commented Sep 20, 2024

@jataylo I see rocm failures on this PR, related?

Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jataylo these errors look not really related, however please you can confirm this ? :

[rocm / linux-focal-rocm6.1-py3.8 / test (default, 1, 6, linux.rocm.gpu.2)](https://hud.pytorch.org/pr/pytorch/pytorch/136139#30199911968) ([gh](https://github.com/pytorch/pytorch/actions/runs/10883861799/job/30199911968)) ([trunk failure](https://hud.pytorch.org/pytorch/pytorch/commit/b7eb7256fb9a48d1fc452608986b64688b6469fa#29886643882))
inductor/test_flex_decoding.py::TestFlexDecoding::test_builtin_score_mods_bfloat16_score_mod0_head_dims1
[rocm / linux-focal-rocm6.1-py3.8 / test (default, 3, 6, linux.rocm.gpu.2)](https://hud.pytorch.org/pr/pytorch/pytorch/136139#30199912800) ([gh](https://github.com/pytorch/pytorch/actions/runs/10883861799/job/30199912800)) ([trunk failure](https://hud.pytorch.org/pytorch/pytorch/commit/b7eb7256fb9a48d1fc452608986b64688b6469fa#29886644164))
inductor/test_loop_ordering.py::LoopOrderingTest::test_fp8_cast_and_t
[rocm / linux-focal-rocm6.1-py3.8 / test (default, 4, 6, linux.rocm.gpu.2)](https://hud.pytorch.org/pr/pytorch/pytorch/136139#30199913164) ([gh](https://github.com/pytorch/pytorch/actions/runs/10883861799/job/30199913164)) ([trunk failure](https://hud.pytorch.org/pytorch/pytorch/commit/b7eb7256fb9a48d1fc452608986b64688b6469fa#29886645016))
inductor/test_flex_decoding.py::TestFlexDecoding::test_builtin_score_mods_bfloat16_score_mod0_head_dims0

@atalman atalman self-requested a review September 24, 2024 17:54
@pruthvistony pruthvistony added the rocm This tag is for PRs from ROCm team label Sep 24, 2024
@pruthvistony pruthvistony added this to the 2.5.0 milestone Sep 24, 2024
@jataylo
Copy link
Collaborator Author

jataylo commented Sep 24, 2024

I can confirm these failures are unrelated

https://github.com/pytorch/pytorch/pull/136557/files/16ebb15a8d8de4200fddd5c7b7cb8143a834994c..8f94eaaf3da2977e90aef4df9816d0c88fc74da8 This cherry pick will resolve the fp8 failures.

The flex-decode failures I'm not sure what the root cause was to resolve these cc: @amdfaa @jithunnair-amd @jerrymannil but they are not related to this change.

@kit1980 kit1980 merged commit dd73223 into pytorch:release/2.5 Sep 25, 2024
@jithunnair-amd
Copy link
Collaborator

Verified in torch2.5 final RC wheel - pip3 install torch==2.5.0 torchvision --index-url https://download.pytorch.org/whl/test/rocm6.2 - that the _inductor/runtime/triton_heuristics.py file in the pytorch wheel contains the fix:

            if self.device_props.type == "hip":
                if "waves_per_eu" in compile_meta:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/rocm Trigger "default" config CI on ROCm module: inductor module: rocm AMD GPU support for Pytorch open source rocm This tag is for PRs from ROCm team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants