[ROCm] enabling miopen_batch_norm lowering in inductor#105740
[ROCm] enabling miopen_batch_norm lowering in inductor#105740jataylo wants to merge 12 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/105740
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 4 Unrelated FailuresAs of commit 82faf5b: NEW FAILURE - The following job has failed:
BROKEN TRUNK - The following job failed but were present on the merge base 29f856e:👉 Rebase onto the `viable/strict` branch to avoid these failures
UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@jataylo Are the perf issues ( due to which the lowering was disabled) resolved in ROCm5.6 specifically? |
@jithunnair-amd Not from ROCm5.6 specifically seems to mostly be python/triton updates that is closing the gap here. The primary motivation for pushing this change is a failure faced in a few models that use batch_norm from some pytorch change. We will keep tabs on performance and revert if deemed necessary cc: @dllehr-amd |
|
Hey @malfet enabling lowering for miopen_batch_norm on inductor to avoid some failures and unskipped the affected UTs. Could you help us approve this? UTs failures are unrelated to this change. The batch norm related UTs pass e.g. cc: @jithunnair-amd |
|
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 5 checks: pull / linux-focal-py3.8-gcc7 / test (distributed, 2, 2, linux.2xlarge), slow / linux-focal-rocm5.6-py3.8 / test (slow, 1, 1, linux.rocm.gpu), periodic / linux-focal-rocm5.6-py3.8 / test (distributed, 1, 2, linux.rocm.gpu, unstable), trunk / linux-focal-rocm5.6-py3.8 / test (default, 1, 3, linux.rocm.gpu, unstable), inductor / cuda11.8-py3.10-gcc7-sm86 / test (inductor_torchbench, 1, 1, linux.g5.4xlarge.nvidia.gpu, unstable) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Enabling miopen_batch_norm lowering for inductor only.
This is to avoid errors observed in some models and perf difference is very close from initial benchmarks.
cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @hongxiayang @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov