[ROCm] Skipping unit tests on ROCm: compile_preserves_metadata_cache and triton_autotuning by apakbin · Pull Request #172681 · pytorch/pytorch

apakbin · 2026-01-16T23:00:43Z

This PR skips two unit tests:

test_compile_preserves_metadata_cache is currently skipped on AMD MI200 and AMD MI300. Adding MI350 to the list.
triton_autotuning is currently skipped on rocm/pytorch, but not here.

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @aditew01 @gujinghui @PenghuiCheng @jianyuh @min-jean-cho @yanbing-j @Guobing-Chen @Xia-Weiwen @snadampal @mcarilli @ptrblck @leslie-fang-intel @voznesenskym @penguinwu @EikanWang @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @Lucaskabela

pytorch-bot · 2026-01-16T23:00:46Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/172681

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Skylion007 · 2026-01-17T17:36:34Z

        ):
            torch._export.aot_compile(Model(), (x, y, m))

+    @skipIfRocm  # RoCM does not support the config block size in test suite.


Should we skip or xfail here?

Makes sense to me to change the UT on ROCm to use a supported config

jataylo · 2026-01-19T17:30:25Z

        ):
            torch._export.aot_compile(Model(), (x, y, m))

+    @skipIfRocm  # RoCM does not support the config block size in test suite.


Makes sense to me to change the UT on ROCm to use a supported config

jataylo · 2026-01-19T17:30:47Z

    @skipIfTorchDynamo("Test compiles internally")
    # efficient_attention_forward meta kernel shape mismatch on CDNA - see issue #171568
-    @skipIfRocmArch(MI200_ARCH + MI300_ARCH)
+    @skipIfRocmArch(MI200_ARCH + MI300_ARCH + MI350_ARCH)


We should probably just use skipIfRocm instead of bloating this

linux-foundation-easycla · 2026-01-19T18:50:15Z

The committers listed above are authorized under a signed CLA.

✅ login: apakbin / name: Arash Pakbin (6430a5d)

This PR fixes ROCm-specific unit test issues: * **`test_sdpa_autocast`**, **`test_sdpa_backwards`**, and **`test_compile_preserves_metadata_cache`**: Currently skipped on AMD MI200 and MI300. This PR extends the skip to all ROCm architectures for consistency. * **`test_mm_plus_mm3`**: Replaced `expectedFailureDynamicWrapper` with `pytest.mark.xfail( condition=not torch.version.hip,...)`. C++ wrapper dynamic shapes passes on ROCm. * **`test_triton_autotuning`**: This test was failing on ROCm because it expected a grid value of 32736 for all AMD architectures. Fixed by using 32736 only for MI300 (gfx94x), and 1023 for other architectures including MI350, which matches CUDA behavior. This enables the test to run on ROCm instead of being skipped entirely as it is in [rocm/pytorch](https://github.com/ROCm/pytorch/blob/f742da38826cf947bc52fdcbba7211d6853e0369/test/inductor/test_aot_inductor.py#L6939). * **`test_triton_mutated_autotuning`**: Applied the same grid value fix as `test_triton_autotuning`. * all tests in **`test/inductor/test_select_algorithm.py`**: guarded `self.assertEqual(counters["inductor"]["select_algorithm_autotune"], ...)` assertions so they do not run on ROCm, as autotuning behavior is non-deterministic on this platform (candidate prescreening may filter more aggressively based on architecture-specific kernel availability). Prior version of PR: #172681. Pull Request resolved: #172780 Approved by: https://github.com/jeffdaily

…2780) This PR fixes ROCm-specific unit test issues: * **`test_sdpa_autocast`**, **`test_sdpa_backwards`**, and **`test_compile_preserves_metadata_cache`**: Currently skipped on AMD MI200 and MI300. This PR extends the skip to all ROCm architectures for consistency. This PR will also enable the below DISABLED github issues to be closed: `test_sdpa_autocast`: Fixes #173715 `test_sdpa_backwards`: Fixes #173712 Fixes #173713 Fixes #173714 `test_compile_preserves_metadata_cache`: Fixes #173717 * **`test_mm_plus_mm3`**: Replaced `expectedFailureDynamicWrapper` with `pytest.mark.xfail( condition=not torch.version.hip,...)`. C++ wrapper dynamic shapes passes on ROCm. * **`test_triton_autotuning`**: This test was failing on ROCm because it expected a grid value of 32736 for all AMD architectures. Fixed by using 32736 only for MI300 (gfx94x), and 1023 for other architectures including MI350, which matches CUDA behavior. This enables the test to run on ROCm instead of being skipped entirely as it is in [rocm/pytorch](https://github.com/ROCm/pytorch/blob/f742da38826cf947bc52fdcbba7211d6853e0369/test/inductor/test_aot_inductor.py#L6939). Fixes #173619 * **`test_triton_mutated_autotuning`**: Applied the same grid value fix as `test_triton_autotuning`. Fixes #173620 * all tests in **`test/inductor/test_select_algorithm.py`**: guarded `self.assertEqual(counters["inductor"]["select_algorithm_autotune"], ...)` assertions so they do not run on ROCm, as autotuning behavior is non-deterministic on this platform (candidate prescreening may filter more aggressively based on architecture-specific kernel availability). - **`test_copy_non_blocking_is_pinned`**: Observed failures on Navi machines, skipped them while they are being investigated. - **`test_2d_reduction_odd_shapes`**: On ROCm (Navi vs MI*), backend scheduling differences can cause one fewer block descriptor than expected. Updated test to allow at most one fewer block descriptor (minimum 1). Finally skipped this on Navi/mi200 due to flakiness in the last part of the test matching BLOCK_R0 and BLOCK_R1 in the generated code. Behavior not consistent across different CI runs. - **`test_upsample_layout`**: On ROCm, `bfloat16` may use `extern_kernels.convolution` instead of MKLDNN. Updated test to check for `extern_kernels.convolution` when MKLDNN is not present, since `transpose_mxn` is only required for MKLDNN. Prior version of PR: #172681. Pull Request resolved: #172780 Approved by: https://github.com/jeffdaily

…2780) This PR fixes ROCm-specific unit test issues: * **`test_sdpa_autocast`**, **`test_sdpa_backwards`**, and **`test_compile_preserves_metadata_cache`**: Currently skipped on AMD MI200 and MI300. This PR extends the skip to all ROCm architectures for consistency. This PR will also enable the below DISABLED github issues to be closed: `test_sdpa_autocast`: Fixes #173715 `test_sdpa_backwards`: Fixes #173712 Fixes #173713 Fixes #173714 `test_compile_preserves_metadata_cache`: Fixes #173717 * **`test_mm_plus_mm3`**: Replaced `expectedFailureDynamicWrapper` with `pytest.mark.xfail( condition=not torch.version.hip,...)`. C++ wrapper dynamic shapes passes on ROCm. * **`test_triton_autotuning`**: This test was failing on ROCm because it expected a grid value of 32736 for all AMD architectures. Fixed by checking if the grid value is one of the possible values based on the configs. This enables the test to run on ROCm instead of being skipped entirely as it is in [rocm/pytorch](https://github.com/ROCm/pytorch/blob/f742da38826cf947bc52fdcbba7211d6853e0369/test/inductor/test_aot_inductor.py#L6939). Fixes #173619 * **`test_triton_mutated_autotuning`**: Applied the same grid value fix as `test_triton_autotuning`. Fixes #173620 * all tests in **`test/inductor/test_select_algorithm.py`**: guarded `self.assertEqual(counters["inductor"]["select_algorithm_autotune"], ...)` assertions so they do not run on ROCm, as autotuning behavior is non-deterministic on this platform (candidate prescreening may filter more aggressively based on architecture-specific kernel availability). - **`test_copy_non_blocking_is_pinned`**: Observed failures on Navi machines, skipped them while they are being investigated. - **`test_2d_reduction_odd_shapes`**: On ROCm (Navi vs MI*), backend scheduling differences can cause one fewer block descriptor than expected. Updated test to allow at most one fewer block descriptor (minimum 1). Finally skipped this on Navi/mi200 due to flakiness in the last part of the test matching BLOCK_R0 and BLOCK_R1 in the generated code. Behavior not consistent across different CI runs. - **`test_upsample_layout`**: On ROCm, `bfloat16` may use `extern_kernels.convolution` instead of MKLDNN. Updated test to check for `extern_kernels.convolution` when MKLDNN is not present, since `transpose_mxn` is only required for MKLDNN. Prior version of PR: #172681. Pull Request resolved: #172780 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>

…orch#172780) This PR fixes ROCm-specific unit test issues: * **`test_sdpa_autocast`**, **`test_sdpa_backwards`**, and **`test_compile_preserves_metadata_cache`**: Currently skipped on AMD MI200 and MI300. This PR extends the skip to all ROCm architectures for consistency. This PR will also enable the below DISABLED github issues to be closed: `test_sdpa_autocast`: Fixes pytorch#173715 `test_sdpa_backwards`: Fixes pytorch#173712 Fixes pytorch#173713 Fixes pytorch#173714 `test_compile_preserves_metadata_cache`: Fixes pytorch#173717 * **`test_mm_plus_mm3`**: Replaced `expectedFailureDynamicWrapper` with `pytest.mark.xfail( condition=not torch.version.hip,...)`. C++ wrapper dynamic shapes passes on ROCm. * **`test_triton_autotuning`**: This test was failing on ROCm because it expected a grid value of 32736 for all AMD architectures. Fixed by using 32736 only for MI300 (gfx94x), and 1023 for other architectures including MI350, which matches CUDA behavior. This enables the test to run on ROCm instead of being skipped entirely as it is in [rocm/pytorch](https://github.com/ROCm/pytorch/blob/f742da38826cf947bc52fdcbba7211d6853e0369/test/inductor/test_aot_inductor.py#L6939). Fixes pytorch#173619 * **`test_triton_mutated_autotuning`**: Applied the same grid value fix as `test_triton_autotuning`. Fixes pytorch#173620 * all tests in **`test/inductor/test_select_algorithm.py`**: guarded `self.assertEqual(counters["inductor"]["select_algorithm_autotune"], ...)` assertions so they do not run on ROCm, as autotuning behavior is non-deterministic on this platform (candidate prescreening may filter more aggressively based on architecture-specific kernel availability). - **`test_copy_non_blocking_is_pinned`**: Observed failures on Navi machines, skipped them while they are being investigated. - **`test_2d_reduction_odd_shapes`**: On ROCm (Navi vs MI*), backend scheduling differences can cause one fewer block descriptor than expected. Updated test to allow at most one fewer block descriptor (minimum 1). Finally skipped this on Navi/mi200 due to flakiness in the last part of the test matching BLOCK_R0 and BLOCK_R1 in the generated code. Behavior not consistent across different CI runs. - **`test_upsample_layout`**: On ROCm, `bfloat16` may use `extern_kernels.convolution` instead of MKLDNN. Updated test to check for `extern_kernels.convolution` when MKLDNN is not present, since `transpose_mxn` is only required for MKLDNN. Prior version of PR: pytorch#172681. Pull Request resolved: pytorch#172780 Approved by: https://github.com/jeffdaily

…orch#172780) This PR fixes ROCm-specific unit test issues: * **`test_sdpa_autocast`**, **`test_sdpa_backwards`**, and **`test_compile_preserves_metadata_cache`**: Currently skipped on AMD MI200 and MI300. This PR extends the skip to all ROCm architectures for consistency. This PR will also enable the below DISABLED github issues to be closed: `test_sdpa_autocast`: Fixes pytorch#173715 `test_sdpa_backwards`: Fixes pytorch#173712 Fixes pytorch#173713 Fixes pytorch#173714 `test_compile_preserves_metadata_cache`: Fixes pytorch#173717 * **`test_mm_plus_mm3`**: Replaced `expectedFailureDynamicWrapper` with `pytest.mark.xfail( condition=not torch.version.hip,...)`. C++ wrapper dynamic shapes passes on ROCm. * **`test_triton_autotuning`**: This test was failing on ROCm because it expected a grid value of 32736 for all AMD architectures. Fixed by checking if the grid value is one of the possible values based on the configs. This enables the test to run on ROCm instead of being skipped entirely as it is in [rocm/pytorch](https://github.com/ROCm/pytorch/blob/f742da38826cf947bc52fdcbba7211d6853e0369/test/inductor/test_aot_inductor.py#L6939). Fixes pytorch#173619 * **`test_triton_mutated_autotuning`**: Applied the same grid value fix as `test_triton_autotuning`. Fixes pytorch#173620 * all tests in **`test/inductor/test_select_algorithm.py`**: guarded `self.assertEqual(counters["inductor"]["select_algorithm_autotune"], ...)` assertions so they do not run on ROCm, as autotuning behavior is non-deterministic on this platform (candidate prescreening may filter more aggressively based on architecture-specific kernel availability). - **`test_copy_non_blocking_is_pinned`**: Observed failures on Navi machines, skipped them while they are being investigated. - **`test_2d_reduction_odd_shapes`**: On ROCm (Navi vs MI*), backend scheduling differences can cause one fewer block descriptor than expected. Updated test to allow at most one fewer block descriptor (minimum 1). Finally skipped this on Navi/mi200 due to flakiness in the last part of the test matching BLOCK_R0 and BLOCK_R1 in the generated code. Behavior not consistent across different CI runs. - **`test_upsample_layout`**: On ROCm, `bfloat16` may use `extern_kernels.convolution` instead of MKLDNN. Updated test to check for `extern_kernels.convolution` when MKLDNN is not present, since `transpose_mxn` is only required for MKLDNN. Prior version of PR: pytorch#172681. Pull Request resolved: pytorch#172780 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>

…orch#172780) This PR fixes ROCm-specific unit test issues: * **`test_sdpa_autocast`**, **`test_sdpa_backwards`**, and **`test_compile_preserves_metadata_cache`**: Currently skipped on AMD MI200 and MI300. This PR extends the skip to all ROCm architectures for consistency. This PR will also enable the below DISABLED github issues to be closed: `test_sdpa_autocast`: Fixes pytorch#173715 `test_sdpa_backwards`: Fixes pytorch#173712 Fixes pytorch#173713 Fixes pytorch#173714 `test_compile_preserves_metadata_cache`: Fixes pytorch#173717 * **`test_mm_plus_mm3`**: Replaced `expectedFailureDynamicWrapper` with `pytest.mark.xfail( condition=not torch.version.hip,...)`. C++ wrapper dynamic shapes passes on ROCm. * **`test_triton_autotuning`**: This test was failing on ROCm because it expected a grid value of 32736 for all AMD architectures. Fixed by using 32736 only for MI300 (gfx94x), and 1023 for other architectures including MI350, which matches CUDA behavior. This enables the test to run on ROCm instead of being skipped entirely as it is in [rocm/pytorch](https://github.com/ROCm/pytorch/blob/f742da38826cf947bc52fdcbba7211d6853e0369/test/inductor/test_aot_inductor.py#L6939). Fixes pytorch#173619 * **`test_triton_mutated_autotuning`**: Applied the same grid value fix as `test_triton_autotuning`. Fixes pytorch#173620 * all tests in **`test/inductor/test_select_algorithm.py`**: guarded `self.assertEqual(counters["inductor"]["select_algorithm_autotune"], ...)` assertions so they do not run on ROCm, as autotuning behavior is non-deterministic on this platform (candidate prescreening may filter more aggressively based on architecture-specific kernel availability). - **`test_copy_non_blocking_is_pinned`**: Observed failures on Navi machines, skipped them while they are being investigated. - **`test_2d_reduction_odd_shapes`**: On ROCm (Navi vs MI*), backend scheduling differences can cause one fewer block descriptor than expected. Updated test to allow at most one fewer block descriptor (minimum 1). Finally skipped this on Navi/mi200 due to flakiness in the last part of the test matching BLOCK_R0 and BLOCK_R1 in the generated code. Behavior not consistent across different CI runs. - **`test_upsample_layout`**: On ROCm, `bfloat16` may use `extern_kernels.convolution` instead of MKLDNN. Updated test to check for `extern_kernels.convolution` when MKLDNN is not present, since `transpose_mxn` is only required for MKLDNN. Prior version of PR: pytorch#172681. Pull Request resolved: pytorch#172780 Approved by: https://github.com/jeffdaily

pytorch-bot Bot added module: inductor module: rocm AMD GPU support for Pytorch topic: not user facing topic category labels Jan 16, 2026

pytorchbot added the open source label Jan 16, 2026

Skylion007 reviewed Jan 17, 2026

View reviewed changes

jataylo reviewed Jan 19, 2026

View reviewed changes

apakbin requested review from a team, Aidyn-A, EikanWang, bdhirsh, bobrenjc93, eqy, gujinghui, jeffdaily, kulinseth, laithsakka, malfet, mruberry, syed-ahmed and zou3519 as code owners January 19, 2026 18:50

pytorch-bot Bot added module: cpu CPU specific problem (e.g., perf, algorithm) module: dynamo release notes: distributed (dtensor) release notes category labels Jan 19, 2026

[ROCm] Skip failing tests on ROCm CI

6430a5d

apakbin force-pushed the arpakbin-skiptests-rocm branch from 39cd5cf to 6430a5d Compare January 19, 2026 18:59

apakbin requested review from amjames, nikitaved and pearu as code owners January 19, 2026 18:59

apakbin requested review from Chillee, IvanYashchuk, albanD, desertfire, digantdesai, drisspg, eellison, egienvalue, ezyang, jerryzh168, jianyuh, jithunnair-amd, kimishpatel, lezcano, mikaylagawarecki, pruthvistony, salilsdesai, shunting314 and slayton58 as code owners January 19, 2026 18:59

apakbin closed this Jan 19, 2026

pytorch-bot Bot added module: amp (automated mixed precision) autocast module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration release notes: quantization release notes category labels Jan 19, 2026

apakbin mentioned this pull request Jan 19, 2026

[ROCm] Updating failing unit tests: Navi & MI200 & MI300 & MI350 #172780

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm] Skipping unit tests on ROCm: compile_preserves_metadata_cache and triton_autotuning#172681

[ROCm] Skipping unit tests on ROCm: compile_preserves_metadata_cache and triton_autotuning#172681
apakbin wants to merge 1 commit intopytorch:mainfrom
apakbin:arpakbin-skiptests-rocm

apakbin commented Jan 16, 2026 •

edited by pytorch-bot Bot

Loading

Uh oh!

pytorch-bot Bot commented Jan 16, 2026 •

edited

Loading

Uh oh!

Skylion007 Jan 17, 2026

Uh oh!

jataylo Jan 19, 2026

Uh oh!

jataylo Jan 19, 2026

Uh oh!

jataylo Jan 19, 2026

Uh oh!

linux-foundation-easycla Bot commented Jan 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

apakbin commented Jan 16, 2026 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/172681

Uh oh!

Skylion007 Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

jataylo Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

jataylo Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

jataylo Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

linux-foundation-easycla Bot commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

apakbin commented Jan 16, 2026 •

edited by pytorch-bot Bot

Loading

pytorch-bot Bot commented Jan 16, 2026 •

edited

Loading

linux-foundation-easycla Bot commented Jan 19, 2026 •

edited

Loading