[ROCm] Skipping unit tests on ROCm: compile_preserves_metadata_cache and triton_autotuning#172681
Closed
apakbin wants to merge 1 commit intopytorch:mainfrom
Closed
[ROCm] Skipping unit tests on ROCm: compile_preserves_metadata_cache and triton_autotuning#172681apakbin wants to merge 1 commit intopytorch:mainfrom
apakbin wants to merge 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/172681
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Skylion007
reviewed
Jan 17, 2026
| ): | ||
| torch._export.aot_compile(Model(), (x, y, m)) | ||
|
|
||
| @skipIfRocm # RoCM does not support the config block size in test suite. |
Collaborator
There was a problem hiding this comment.
Should we skip or xfail here?
Collaborator
There was a problem hiding this comment.
Makes sense to me to change the UT on ROCm to use a supported config
jataylo
reviewed
Jan 19, 2026
| ): | ||
| torch._export.aot_compile(Model(), (x, y, m)) | ||
|
|
||
| @skipIfRocm # RoCM does not support the config block size in test suite. |
Collaborator
There was a problem hiding this comment.
Makes sense to me to change the UT on ROCm to use a supported config
| @skipIfTorchDynamo("Test compiles internally") | ||
| # efficient_attention_forward meta kernel shape mismatch on CDNA - see issue #171568 | ||
| @skipIfRocmArch(MI200_ARCH + MI300_ARCH) | ||
| @skipIfRocmArch(MI200_ARCH + MI300_ARCH + MI350_ARCH) |
Collaborator
There was a problem hiding this comment.
We should probably just use skipIfRocm instead of bloating this
|
|
39cd5cf to
6430a5d
Compare
pytorchmergebot
pushed a commit
that referenced
this pull request
Jan 23, 2026
This PR fixes ROCm-specific unit test issues: * **`test_sdpa_autocast`**, **`test_sdpa_backwards`**, and **`test_compile_preserves_metadata_cache`**: Currently skipped on AMD MI200 and MI300. This PR extends the skip to all ROCm architectures for consistency. * **`test_mm_plus_mm3`**: Replaced `expectedFailureDynamicWrapper` with `pytest.mark.xfail( condition=not torch.version.hip,...)`. C++ wrapper dynamic shapes passes on ROCm. * **`test_triton_autotuning`**: This test was failing on ROCm because it expected a grid value of 32736 for all AMD architectures. Fixed by using 32736 only for MI300 (gfx94x), and 1023 for other architectures including MI350, which matches CUDA behavior. This enables the test to run on ROCm instead of being skipped entirely as it is in [rocm/pytorch](https://github.com/ROCm/pytorch/blob/f742da38826cf947bc52fdcbba7211d6853e0369/test/inductor/test_aot_inductor.py#L6939). * **`test_triton_mutated_autotuning`**: Applied the same grid value fix as `test_triton_autotuning`. * all tests in **`test/inductor/test_select_algorithm.py`**: guarded `self.assertEqual(counters["inductor"]["select_algorithm_autotune"], ...)` assertions so they do not run on ROCm, as autotuning behavior is non-deterministic on this platform (candidate prescreening may filter more aggressively based on architecture-specific kernel availability). Prior version of PR: #172681. Pull Request resolved: #172780 Approved by: https://github.com/jeffdaily
pytorchmergebot
pushed a commit
that referenced
this pull request
Feb 4, 2026
…2780) This PR fixes ROCm-specific unit test issues: * **`test_sdpa_autocast`**, **`test_sdpa_backwards`**, and **`test_compile_preserves_metadata_cache`**: Currently skipped on AMD MI200 and MI300. This PR extends the skip to all ROCm architectures for consistency. This PR will also enable the below DISABLED github issues to be closed: `test_sdpa_autocast`: Fixes #173715 `test_sdpa_backwards`: Fixes #173712 Fixes #173713 Fixes #173714 `test_compile_preserves_metadata_cache`: Fixes #173717 * **`test_mm_plus_mm3`**: Replaced `expectedFailureDynamicWrapper` with `pytest.mark.xfail( condition=not torch.version.hip,...)`. C++ wrapper dynamic shapes passes on ROCm. * **`test_triton_autotuning`**: This test was failing on ROCm because it expected a grid value of 32736 for all AMD architectures. Fixed by using 32736 only for MI300 (gfx94x), and 1023 for other architectures including MI350, which matches CUDA behavior. This enables the test to run on ROCm instead of being skipped entirely as it is in [rocm/pytorch](https://github.com/ROCm/pytorch/blob/f742da38826cf947bc52fdcbba7211d6853e0369/test/inductor/test_aot_inductor.py#L6939). Fixes #173619 * **`test_triton_mutated_autotuning`**: Applied the same grid value fix as `test_triton_autotuning`. Fixes #173620 * all tests in **`test/inductor/test_select_algorithm.py`**: guarded `self.assertEqual(counters["inductor"]["select_algorithm_autotune"], ...)` assertions so they do not run on ROCm, as autotuning behavior is non-deterministic on this platform (candidate prescreening may filter more aggressively based on architecture-specific kernel availability). - **`test_copy_non_blocking_is_pinned`**: Observed failures on Navi machines, skipped them while they are being investigated. - **`test_2d_reduction_odd_shapes`**: On ROCm (Navi vs MI*), backend scheduling differences can cause one fewer block descriptor than expected. Updated test to allow at most one fewer block descriptor (minimum 1). Finally skipped this on Navi/mi200 due to flakiness in the last part of the test matching BLOCK_R0 and BLOCK_R1 in the generated code. Behavior not consistent across different CI runs. - **`test_upsample_layout`**: On ROCm, `bfloat16` may use `extern_kernels.convolution` instead of MKLDNN. Updated test to check for `extern_kernels.convolution` when MKLDNN is not present, since `transpose_mxn` is only required for MKLDNN. Prior version of PR: #172681. Pull Request resolved: #172780 Approved by: https://github.com/jeffdaily
pytorchmergebot
pushed a commit
that referenced
this pull request
Feb 7, 2026
…2780) This PR fixes ROCm-specific unit test issues: * **`test_sdpa_autocast`**, **`test_sdpa_backwards`**, and **`test_compile_preserves_metadata_cache`**: Currently skipped on AMD MI200 and MI300. This PR extends the skip to all ROCm architectures for consistency. This PR will also enable the below DISABLED github issues to be closed: `test_sdpa_autocast`: Fixes #173715 `test_sdpa_backwards`: Fixes #173712 Fixes #173713 Fixes #173714 `test_compile_preserves_metadata_cache`: Fixes #173717 * **`test_mm_plus_mm3`**: Replaced `expectedFailureDynamicWrapper` with `pytest.mark.xfail( condition=not torch.version.hip,...)`. C++ wrapper dynamic shapes passes on ROCm. * **`test_triton_autotuning`**: This test was failing on ROCm because it expected a grid value of 32736 for all AMD architectures. Fixed by checking if the grid value is one of the possible values based on the configs. This enables the test to run on ROCm instead of being skipped entirely as it is in [rocm/pytorch](https://github.com/ROCm/pytorch/blob/f742da38826cf947bc52fdcbba7211d6853e0369/test/inductor/test_aot_inductor.py#L6939). Fixes #173619 * **`test_triton_mutated_autotuning`**: Applied the same grid value fix as `test_triton_autotuning`. Fixes #173620 * all tests in **`test/inductor/test_select_algorithm.py`**: guarded `self.assertEqual(counters["inductor"]["select_algorithm_autotune"], ...)` assertions so they do not run on ROCm, as autotuning behavior is non-deterministic on this platform (candidate prescreening may filter more aggressively based on architecture-specific kernel availability). - **`test_copy_non_blocking_is_pinned`**: Observed failures on Navi machines, skipped them while they are being investigated. - **`test_2d_reduction_odd_shapes`**: On ROCm (Navi vs MI*), backend scheduling differences can cause one fewer block descriptor than expected. Updated test to allow at most one fewer block descriptor (minimum 1). Finally skipped this on Navi/mi200 due to flakiness in the last part of the test matching BLOCK_R0 and BLOCK_R1 in the generated code. Behavior not consistent across different CI runs. - **`test_upsample_layout`**: On ROCm, `bfloat16` may use `extern_kernels.convolution` instead of MKLDNN. Updated test to check for `extern_kernels.convolution` when MKLDNN is not present, since `transpose_mxn` is only required for MKLDNN. Prior version of PR: #172681. Pull Request resolved: #172780 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>
radeksm
pushed a commit
to radeksm/pytorch
that referenced
this pull request
Feb 20, 2026
…orch#172780) This PR fixes ROCm-specific unit test issues: * **`test_sdpa_autocast`**, **`test_sdpa_backwards`**, and **`test_compile_preserves_metadata_cache`**: Currently skipped on AMD MI200 and MI300. This PR extends the skip to all ROCm architectures for consistency. This PR will also enable the below DISABLED github issues to be closed: `test_sdpa_autocast`: Fixes pytorch#173715 `test_sdpa_backwards`: Fixes pytorch#173712 Fixes pytorch#173713 Fixes pytorch#173714 `test_compile_preserves_metadata_cache`: Fixes pytorch#173717 * **`test_mm_plus_mm3`**: Replaced `expectedFailureDynamicWrapper` with `pytest.mark.xfail( condition=not torch.version.hip,...)`. C++ wrapper dynamic shapes passes on ROCm. * **`test_triton_autotuning`**: This test was failing on ROCm because it expected a grid value of 32736 for all AMD architectures. Fixed by using 32736 only for MI300 (gfx94x), and 1023 for other architectures including MI350, which matches CUDA behavior. This enables the test to run on ROCm instead of being skipped entirely as it is in [rocm/pytorch](https://github.com/ROCm/pytorch/blob/f742da38826cf947bc52fdcbba7211d6853e0369/test/inductor/test_aot_inductor.py#L6939). Fixes pytorch#173619 * **`test_triton_mutated_autotuning`**: Applied the same grid value fix as `test_triton_autotuning`. Fixes pytorch#173620 * all tests in **`test/inductor/test_select_algorithm.py`**: guarded `self.assertEqual(counters["inductor"]["select_algorithm_autotune"], ...)` assertions so they do not run on ROCm, as autotuning behavior is non-deterministic on this platform (candidate prescreening may filter more aggressively based on architecture-specific kernel availability). - **`test_copy_non_blocking_is_pinned`**: Observed failures on Navi machines, skipped them while they are being investigated. - **`test_2d_reduction_odd_shapes`**: On ROCm (Navi vs MI*), backend scheduling differences can cause one fewer block descriptor than expected. Updated test to allow at most one fewer block descriptor (minimum 1). Finally skipped this on Navi/mi200 due to flakiness in the last part of the test matching BLOCK_R0 and BLOCK_R1 in the generated code. Behavior not consistent across different CI runs. - **`test_upsample_layout`**: On ROCm, `bfloat16` may use `extern_kernels.convolution` instead of MKLDNN. Updated test to check for `extern_kernels.convolution` when MKLDNN is not present, since `transpose_mxn` is only required for MKLDNN. Prior version of PR: pytorch#172681. Pull Request resolved: pytorch#172780 Approved by: https://github.com/jeffdaily
radeksm
pushed a commit
to radeksm/pytorch
that referenced
this pull request
Feb 20, 2026
…orch#172780) This PR fixes ROCm-specific unit test issues: * **`test_sdpa_autocast`**, **`test_sdpa_backwards`**, and **`test_compile_preserves_metadata_cache`**: Currently skipped on AMD MI200 and MI300. This PR extends the skip to all ROCm architectures for consistency. This PR will also enable the below DISABLED github issues to be closed: `test_sdpa_autocast`: Fixes pytorch#173715 `test_sdpa_backwards`: Fixes pytorch#173712 Fixes pytorch#173713 Fixes pytorch#173714 `test_compile_preserves_metadata_cache`: Fixes pytorch#173717 * **`test_mm_plus_mm3`**: Replaced `expectedFailureDynamicWrapper` with `pytest.mark.xfail( condition=not torch.version.hip,...)`. C++ wrapper dynamic shapes passes on ROCm. * **`test_triton_autotuning`**: This test was failing on ROCm because it expected a grid value of 32736 for all AMD architectures. Fixed by checking if the grid value is one of the possible values based on the configs. This enables the test to run on ROCm instead of being skipped entirely as it is in [rocm/pytorch](https://github.com/ROCm/pytorch/blob/f742da38826cf947bc52fdcbba7211d6853e0369/test/inductor/test_aot_inductor.py#L6939). Fixes pytorch#173619 * **`test_triton_mutated_autotuning`**: Applied the same grid value fix as `test_triton_autotuning`. Fixes pytorch#173620 * all tests in **`test/inductor/test_select_algorithm.py`**: guarded `self.assertEqual(counters["inductor"]["select_algorithm_autotune"], ...)` assertions so they do not run on ROCm, as autotuning behavior is non-deterministic on this platform (candidate prescreening may filter more aggressively based on architecture-specific kernel availability). - **`test_copy_non_blocking_is_pinned`**: Observed failures on Navi machines, skipped them while they are being investigated. - **`test_2d_reduction_odd_shapes`**: On ROCm (Navi vs MI*), backend scheduling differences can cause one fewer block descriptor than expected. Updated test to allow at most one fewer block descriptor (minimum 1). Finally skipped this on Navi/mi200 due to flakiness in the last part of the test matching BLOCK_R0 and BLOCK_R1 in the generated code. Behavior not consistent across different CI runs. - **`test_upsample_layout`**: On ROCm, `bfloat16` may use `extern_kernels.convolution` instead of MKLDNN. Updated test to check for `extern_kernels.convolution` when MKLDNN is not present, since `transpose_mxn` is only required for MKLDNN. Prior version of PR: pytorch#172681. Pull Request resolved: pytorch#172780 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>
libohao1201
pushed a commit
to libohao1201/pytorch
that referenced
this pull request
Mar 2, 2026
…orch#172780) This PR fixes ROCm-specific unit test issues: * **`test_sdpa_autocast`**, **`test_sdpa_backwards`**, and **`test_compile_preserves_metadata_cache`**: Currently skipped on AMD MI200 and MI300. This PR extends the skip to all ROCm architectures for consistency. This PR will also enable the below DISABLED github issues to be closed: `test_sdpa_autocast`: Fixes pytorch#173715 `test_sdpa_backwards`: Fixes pytorch#173712 Fixes pytorch#173713 Fixes pytorch#173714 `test_compile_preserves_metadata_cache`: Fixes pytorch#173717 * **`test_mm_plus_mm3`**: Replaced `expectedFailureDynamicWrapper` with `pytest.mark.xfail( condition=not torch.version.hip,...)`. C++ wrapper dynamic shapes passes on ROCm. * **`test_triton_autotuning`**: This test was failing on ROCm because it expected a grid value of 32736 for all AMD architectures. Fixed by using 32736 only for MI300 (gfx94x), and 1023 for other architectures including MI350, which matches CUDA behavior. This enables the test to run on ROCm instead of being skipped entirely as it is in [rocm/pytorch](https://github.com/ROCm/pytorch/blob/f742da38826cf947bc52fdcbba7211d6853e0369/test/inductor/test_aot_inductor.py#L6939). Fixes pytorch#173619 * **`test_triton_mutated_autotuning`**: Applied the same grid value fix as `test_triton_autotuning`. Fixes pytorch#173620 * all tests in **`test/inductor/test_select_algorithm.py`**: guarded `self.assertEqual(counters["inductor"]["select_algorithm_autotune"], ...)` assertions so they do not run on ROCm, as autotuning behavior is non-deterministic on this platform (candidate prescreening may filter more aggressively based on architecture-specific kernel availability). - **`test_copy_non_blocking_is_pinned`**: Observed failures on Navi machines, skipped them while they are being investigated. - **`test_2d_reduction_odd_shapes`**: On ROCm (Navi vs MI*), backend scheduling differences can cause one fewer block descriptor than expected. Updated test to allow at most one fewer block descriptor (minimum 1). Finally skipped this on Navi/mi200 due to flakiness in the last part of the test matching BLOCK_R0 and BLOCK_R1 in the generated code. Behavior not consistent across different CI runs. - **`test_upsample_layout`**: On ROCm, `bfloat16` may use `extern_kernels.convolution` instead of MKLDNN. Updated test to check for `extern_kernels.convolution` when MKLDNN is not present, since `transpose_mxn` is only required for MKLDNN. Prior version of PR: pytorch#172681. Pull Request resolved: pytorch#172780 Approved by: https://github.com/jeffdaily
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR skips two unit tests:
cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @aditew01 @gujinghui @PenghuiCheng @jianyuh @min-jean-cho @yanbing-j @Guobing-Chen @Xia-Weiwen @snadampal @mcarilli @ptrblck @leslie-fang-intel @voznesenskym @penguinwu @EikanWang @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @Lucaskabela