[AMD] Fix AMD User Defined Kernel Autotune#160671

Closed

oniononion36 wants to merge 1 commit intopytorch:mainfrom

oniononion36:export-D80285441

Contributor

oniononion36 commented Aug 14, 2025 •

edited by pytorch-bot bot

Loading

Summary: AMD specific kwargs need to be removed from the guard, otherwise a keyerror will be raised when executing the kernel.

Test Plan:

buck2 run mode/opt-amd-gpu -m rocm641 -c fbcode.split-dwarf=true -c fbcode.use_link_groups=true -c fbcode.enable_gpu_sections=true //hpc/new/models/feed/benchmark:feed_lower_benchmark -- --load=manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/894698382/0/gpu_lowering/new_input8 --skip-eager --skip-flop-estimation --sync-mode=0 --lower-backend=AOT_INDUCTOR

can succeed after this change.

Rollback Plan:

Differential Revision: D80285441

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

pytorch-bot bot commented Aug 14, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160671

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures

As of commit c91acda with merge base 67fc16c ():

NEW FAILURES - The following jobs have failed:

inductor-rocm / rocm-py3.10-inductor / test (inductor, 2, 2, linux.rocm.gpu.2) (gh)
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_rocm_triton_autotuning_cuda
inductor-rocm-mi300 / rocm-py3.10-inductor-mi300 / test (inductor, 1, 2, linux.rocm.gpu.gfx942.1) (gh)
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_rocm_triton_autotuning_cuda
rocm / linux-jammy-rocm-py3.10 / test (default, 3, 6, linux.rocm.gpu.2) (gh)
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_rocm_triton_autotuning_cuda

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot bot added ciflow/inductor module: inductor labels

Contributor

facebook-github-bot commented Aug 14, 2025

This pull request was exported from Phabricator. Differential Revision: D80285441

facebook-github-bot added the fb-exported label

oniononion36 added the topic: not user facing label

oniononion36 force-pushed the export-D80285441 branch from 530f5ff to 8e9970e Compare

August 14, 2025 20:28

oniononion36 added a commit to oniononion36/pytorch that referenced this pull request


          [AMD] Fix AMD User Defined Kernel Autotune (pytorch#160671)

8e9970e

Summary:

AMD specific kwargs need to be removed from the guard, otherwise a keyerror will be raised when executing the kernel.

Test Plan:
```
buck2 run mode/opt-amd-gpu -m rocm641 -c fbcode.split-dwarf=true -c fbcode.use_link_groups=true -c fbcode.enable_gpu_sections=true //hpc/new/models/feed/benchmark:feed_lower_benchmark -- --load=manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/894698382/0/gpu_lowering/new_input8 --skip-eager --skip-flop-estimation --sync-mode=0 --lower-backend=AOT_INDUCTOR
```
can succeed after this change.

Rollback Plan:

Differential Revision: D80285441

Contributor

facebook-github-bot commented Aug 14, 2025

This pull request was exported from Phabricator. Differential Revision: D80285441

oniononion36 force-pushed the export-D80285441 branch from 8e9970e to c56ac13 Compare

August 14, 2025 20:48

oniononion36 added a commit to oniononion36/pytorch that referenced this pull request


          [AMD] Fix AMD User Defined Kernel Autotune (pytorch#160671)

c56ac13

Summary:

AMD specific kwargs need to be removed from the guard, otherwise a keyerror will be raised when executing the kernel.

Test Plan:
```
buck2 run mode/opt-amd-gpu -m rocm641 -c fbcode.split-dwarf=true -c fbcode.use_link_groups=true -c fbcode.enable_gpu_sections=true //hpc/new/models/feed/benchmark:feed_lower_benchmark -- --load=manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/894698382/0/gpu_lowering/new_input8 --skip-eager --skip-flop-estimation --sync-mode=0 --lower-backend=AOT_INDUCTOR
```
can succeed after this change.

Rollback Plan:

Differential Revision: D80285441

Contributor

facebook-github-bot commented Aug 14, 2025

This pull request was exported from Phabricator. Differential Revision: D80285441

oniononion36 force-pushed the export-D80285441 branch from c56ac13 to 67962e0 Compare

August 19, 2025 19:40

Contributor

facebook-github-bot commented Aug 19, 2025

This pull request was exported from Phabricator. Differential Revision: D80285441

pytorch-bot bot pushed a commit that referenced this pull request


          [AMD] Fix AMD User Defined Kernel Autotune (#160671)

67962e0

Summary:

AMD specific kwargs need to be removed from the guard, otherwise a keyerror will be raised when executing the kernel.

Test Plan:
```
buck2 run mode/opt-amd-gpu -m rocm641 -c fbcode.split-dwarf=true -c fbcode.use_link_groups=true -c fbcode.enable_gpu_sections=true //hpc/new/models/feed/benchmark:feed_lower_benchmark -- --load=manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/894698382/0/gpu_lowering/new_input8 --skip-eager --skip-flop-estimation --sync-mode=0 --lower-backend=AOT_INDUCTOR
```
can succeed after this change.

Rollback Plan:

Differential Revision: D80285441

oniononion36 added a commit to oniononion36/pytorch that referenced this pull request


          [AMD] Fix AMD User Defined Kernel Autotune (pytorch#160671)

e4f3eb2

Summary:

AMD specific kwargs need to be removed from the guard, otherwise a keyerror will be raised when executing the kernel.

Test Plan:
```
buck2 run mode/opt-amd-gpu -m rocm641 -c fbcode.split-dwarf=true -c fbcode.use_link_groups=true -c fbcode.enable_gpu_sections=true //hpc/new/models/feed/benchmark:feed_lower_benchmark -- --load=manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/894698382/0/gpu_lowering/new_input8 --skip-eager --skip-flop-estimation --sync-mode=0 --lower-backend=AOT_INDUCTOR
```
can succeed after this change.

Rollback Plan:

Differential Revision: D80285441

oniononion36 force-pushed the export-D80285441 branch from 67962e0 to e4f3eb2 Compare

August 19, 2025 20:07

Contributor

facebook-github-bot commented Aug 19, 2025

This pull request was exported from Phabricator. Differential Revision: D80285441

oniononion36 added a commit to oniononion36/pytorch that referenced this pull request


          [AMD] Fix AMD User Defined Kernel Autotune (pytorch#160671)

58dee07

Summary:
Pull Request resolved: pytorch#160671

AMD specific kwargs need to be removed from the guard, otherwise a keyerror will be raised when executing the kernel.

Test Plan:
```
buck2 run mode/opt-amd-gpu -m rocm641 -c fbcode.split-dwarf=true -c fbcode.use_link_groups=true -c fbcode.enable_gpu_sections=true //hpc/new/models/feed/benchmark:feed_lower_benchmark -- --load=manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/894698382/0/gpu_lowering/new_input8 --skip-eager --skip-flop-estimation --sync-mode=0 --lower-backend=AOT_INDUCTOR
```
can succeed after this change.

Rollback Plan:

Differential Revision: D80285441

oniononion36 force-pushed the export-D80285441 branch from e4f3eb2 to 58dee07 Compare

August 19, 2025 20:11

oniononion36 added a commit to oniononion36/pytorch that referenced this pull request


          [AMD] Fix AMD User Defined Kernel Autotune (pytorch#160671)

fb52182

Summary:

AMD specific kwargs need to be removed from the guard, otherwise a keyerror will be raised when executing the kernel.

Test Plan:
```
buck2 run mode/opt-amd-gpu -m rocm641 -c fbcode.split-dwarf=true -c fbcode.use_link_groups=true -c fbcode.enable_gpu_sections=true //hpc/new/models/feed/benchmark:feed_lower_benchmark -- --load=manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/894698382/0/gpu_lowering/new_input8 --skip-eager --skip-flop-estimation --sync-mode=0 --lower-backend=AOT_INDUCTOR
```
can succeed after this change.

Rollback Plan:

Differential Revision: D80285441

oniononion36 force-pushed the export-D80285441 branch from 58dee07 to fb52182 Compare

August 19, 2025 20:19

Contributor

facebook-github-bot commented Aug 19, 2025

This pull request was exported from Phabricator. Differential Revision: D80285441

oniononion36 force-pushed the export-D80285441 branch from fb52182 to dd6fc94 Compare

August 19, 2025 20:39

oniononion36 added a commit to oniononion36/pytorch that referenced this pull request


          [AMD] Fix AMD User Defined Kernel Autotune (pytorch#160671)

dd6fc94

Summary:

AMD specific kwargs need to be removed from the guard, otherwise a keyerror will be raised when executing the kernel.

Test Plan:
```
buck2 run mode/opt-amd-gpu -m rocm641 -c fbcode.split-dwarf=true -c fbcode.use_link_groups=true -c fbcode.enable_gpu_sections=true //hpc/new/models/feed/benchmark:feed_lower_benchmark -- --load=manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/894698382/0/gpu_lowering/new_input8 --skip-eager --skip-flop-estimation --sync-mode=0 --lower-backend=AOT_INDUCTOR
```
can succeed after this change.

Rollback Plan:

Differential Revision: D80285441

Contributor

facebook-github-bot commented Aug 19, 2025

This pull request was exported from Phabricator. Differential Revision: D80285441

oniononion36 added a commit to oniononion36/pytorch that referenced this pull request


          [AMD] Fix AMD User Defined Kernel Autotune (pytorch#160671)

103efd9

Summary:

AMD specific kwargs need to be removed from the guard, otherwise a keyerror will be raised when executing the kernel.

Test Plan:
```
buck2 run mode/opt-amd-gpu -m rocm641 -c fbcode.split-dwarf=true -c fbcode.use_link_groups=true -c fbcode.enable_gpu_sections=true //hpc/new/models/feed/benchmark:feed_lower_benchmark -- --load=manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/894698382/0/gpu_lowering/new_input8 --skip-eager --skip-flop-estimation --sync-mode=0 --lower-backend=AOT_INDUCTOR
```
can succeed after this change.

Rollback Plan:

Differential Revision: D80285441

oniononion36 force-pushed the export-D80285441 branch from dd6fc94 to 103efd9 Compare

August 19, 2025 21:03

Contributor

facebook-github-bot commented Aug 19, 2025

This pull request was exported from Phabricator. Differential Revision: D80285441

oniononion36 force-pushed the export-D80285441 branch from 103efd9 to 188fb37 Compare

August 21, 2025 21:43

pytorchmergebot added the merging label

Collaborator

pytorchmergebot commented Aug 23, 2025

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot closed this in

431846a

pytorchmergebot added Merged and removed merging labels

Contributor

atalman commented Aug 25, 2025

@pytorchmergebot revert -c nosignal -m "new test is failing: inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_rocm_triton_autotuning_cuda GH job link HUD commit link"

Collaborator

pytorchmergebot commented Aug 25, 2025

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request


          Revert "[AMD] Fix AMD User Defined Kernel Autotune (#160671)"

40c0e70

This reverts commit 431846a.

Reverted #160671 on behalf of https://github.com/atalman due to new test is failing: inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_rocm_triton_autotuning_cuda [GH job link](https://github.com/pytorch/pytorch/actions/runs/17172795679/job/48725235301) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/431846a6323c6f1d02da49e311ac694324f386f4) ([comment](#160671 (comment)))

Collaborator

pytorchmergebot commented Aug 25, 2025

@oniononion36 your PR has been successfully reverted.

pytorchmergebot added Reverted ci-no-td labels

pytorchmergebot reopened this

Collaborator

jeffdaily commented Aug 25, 2025

See also https://hud.pytorch.org/failure?name=inductor-rocm%20%2F%20rocm-py3.10-inductor%20%2F%20test%20(inductor%2C%202%2C%202%2C%20linux.rocm.gpu.2)&jobName=undefined&failureCaptures=inductor%2Ftest_aot_inductor.py%3A%3AAOTInductorTestABICompatibleGpu%3A%3Atest_rocm_triton_autotuning_cuda

Contributor

facebook-github-bot commented Aug 25, 2025

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot added the merging label

Collaborator

pytorchmergebot commented Aug 25, 2025

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Collaborator

pytorchmergebot commented Aug 25, 2025

Merge failed

Reason: This PR has internal changes and must be landed via Phabricator! Please try reimporting/rexporting the PR!

Details for Dev Infra team

Raised by workflow job

pytorchmergebot removed the merging label

atalman added keep-going ciflow/rocm ciflow/inductor-rocm labels

oniononion36 added a commit to oniononion36/pytorch that referenced this pull request


          Reland [AMD] Fix AMD User Defined Kernel Autotune (pytorch#160671) (p…

f9ecba9

…ytorch#161521)

Summary:

This is a reland of D80285441, fixed the unit test.

Test Plan:
```
buck2 run mode/opt-amd-gpu -m rocm641 -c fbcode.split-dwarf=true -c fbcode.use_link_groups=true -c fbcode.enable_gpu_sections=true //hpc/new/models/feed/benchmark:feed_lower_benchmark -- --load=manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/894698382/0/gpu_lowering/new_input8 --skip-eager --skip-flop-estimation --sync-mode=0 --lower-backend=AOT_INDUCTOR

```
will succeed after this diff.

Rollback Plan:

Reviewed By: frank-wei

Differential Revision: D80971224

oniononion36 added a commit to oniononion36/pytorch that referenced this pull request


          Reland [AMD] Fix AMD User Defined Kernel Autotune (pytorch#160671) (p…

96634bf

…ytorch#161521)

Summary:
Pull Request resolved: pytorch#161521

This is a reland of D80285441, fixed the unit test.

Test Plan:
```
buck2 run mode/opt-amd-gpu -m rocm641 -c fbcode.split-dwarf=true -c fbcode.use_link_groups=true -c fbcode.enable_gpu_sections=true //hpc/new/models/feed/benchmark:feed_lower_benchmark -- --load=manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/894698382/0/gpu_lowering/new_input8 --skip-eager --skip-flop-estimation --sync-mode=0 --lower-backend=AOT_INDUCTOR

```
will succeed after this diff.

Rollback Plan:

Reviewed By: frank-wei

Differential Revision: D80971224

Contributor

jeanschmidt commented Sep 1, 2025

Unfortunately it is not possible to merge this PR in OSS. I am reverting the diff internally. In order to follow up with these changes, please start fresh with a new Diff/PR pair.

jeanschmidt closed this

pytorch-bot bot pushed a commit that referenced this pull request


          Reland [AMD] Fix AMD User Defined Kernel Autotune (#160671) (#161521)

e756ee4

Summary:

This is a reland of D80285441, fixed the unit test.

Test Plan:
```
buck2 run mode/opt-amd-gpu -m rocm641 -c fbcode.split-dwarf=true -c fbcode.use_link_groups=true -c fbcode.enable_gpu_sections=true //hpc/new/models/feed/benchmark:feed_lower_benchmark -- --load=manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/894698382/0/gpu_lowering/new_input8 --skip-eager --skip-flop-estimation --sync-mode=0 --lower-backend=AOT_INDUCTOR

```
will succeed after this diff.

Rollback Plan:

Reviewed By: frank-wei

Differential Revision: D80971224

pytorch-bot bot pushed a commit that referenced this pull request


          Reland [AMD] Fix AMD User Defined Kernel Autotune (#160671) (#161521)

645d1a9

Summary:

This is a reland of D80285441, fixed the unit test.

Test Plan:
```
buck2 run mode/opt-amd-gpu -m rocm641 -c fbcode.split-dwarf=true -c fbcode.use_link_groups=true -c fbcode.enable_gpu_sections=true //hpc/new/models/feed/benchmark:feed_lower_benchmark -- --load=manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/894698382/0/gpu_lowering/new_input8 --skip-eager --skip-flop-estimation --sync-mode=0 --lower-backend=AOT_INDUCTOR

```
will succeed after this diff.

Rollback Plan:

Reviewed By: frank-wei

Differential Revision: D80971224

markc-614 pushed a commit to markc-614/pytorch that referenced this pull request


          [AMD] Fix AMD User Defined Kernel Autotune (pytorch#160671)

a73f248

Summary: AMD specific kwargs need to be removed from the guard, otherwise a keyerror will be raised when executing the kernel.

Test Plan:
```
buck2 run mode/opt-amd-gpu -m rocm641 -c fbcode.split-dwarf=true -c fbcode.use_link_groups=true -c fbcode.enable_gpu_sections=true //hpc/new/models/feed/benchmark:feed_lower_benchmark -- --load=manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/894698382/0/gpu_lowering/new_input8 --skip-eager --skip-flop-estimation --sync-mode=0 --lower-backend=AOT_INDUCTOR
```
can succeed after this change.

Rollback Plan:

Differential Revision: D80285441

Pull Request resolved: pytorch#160671
Approved by: https://github.com/muchulee8

markc-614 pushed a commit to markc-614/pytorch that referenced this pull request


          Revert "[AMD] Fix AMD User Defined Kernel Autotune (pytorch#160671)"

d1b7de1

This reverts commit 431846a.

Reverted pytorch#160671 on behalf of https://github.com/atalman due to new test is failing: inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_rocm_triton_autotuning_cuda [GH job link](https://github.com/pytorch/pytorch/actions/runs/17172795679/job/48725235301) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/431846a6323c6f1d02da49e311ac694324f386f4) ([comment](pytorch#160671 (comment)))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td ciflow/inductor ciflow/inductor-rocm ciflow/rocm ciflow/trunk fb-exported keep-going Merged module: inductor Reverted topic: not user facing