Skip to content

[ROCm] Skipping unit tests on ROCm: compile_preserves_metadata_cache and triton_autotuning#172681

Closed
apakbin wants to merge 1 commit intopytorch:mainfrom
apakbin:arpakbin-skiptests-rocm
Closed

[ROCm] Skipping unit tests on ROCm: compile_preserves_metadata_cache and triton_autotuning#172681
apakbin wants to merge 1 commit intopytorch:mainfrom
apakbin:arpakbin-skiptests-rocm

Conversation

@apakbin
Copy link
Copy Markdown
Contributor

@apakbin apakbin commented Jan 16, 2026

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Jan 16, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/172681

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot Bot added module: inductor module: rocm AMD GPU support for Pytorch topic: not user facing topic category labels Jan 16, 2026
Comment thread test/inductor/test_aot_inductor.py Outdated
):
torch._export.aot_compile(Model(), (x, y, m))

@skipIfRocm # RoCM does not support the config block size in test suite.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we skip or xfail here?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me to change the UT on ROCm to use a supported config

Comment thread test/inductor/test_aot_inductor.py Outdated
):
torch._export.aot_compile(Model(), (x, y, m))

@skipIfRocm # RoCM does not support the config block size in test suite.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me to change the UT on ROCm to use a supported config

Comment thread test/test_nestedtensor.py
@skipIfTorchDynamo("Test compiles internally")
# efficient_attention_forward meta kernel shape mismatch on CDNA - see issue #171568
@skipIfRocmArch(MI200_ARCH + MI300_ARCH)
@skipIfRocmArch(MI200_ARCH + MI300_ARCH + MI350_ARCH)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably just use skipIfRocm instead of bloating this

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Jan 19, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: apakbin / name: Arash Pakbin (6430a5d)

@apakbin apakbin force-pushed the arpakbin-skiptests-rocm branch from 39cd5cf to 6430a5d Compare January 19, 2026 18:59
@apakbin apakbin closed this Jan 19, 2026
@pytorch-bot pytorch-bot Bot added module: amp (automated mixed precision) autocast module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration release notes: quantization release notes category labels Jan 19, 2026
pytorchmergebot pushed a commit that referenced this pull request Jan 23, 2026
This PR fixes ROCm-specific unit test issues:

* **`test_sdpa_autocast`**, **`test_sdpa_backwards`**, and **`test_compile_preserves_metadata_cache`**: Currently skipped on AMD MI200 and MI300. This PR extends the skip to all ROCm architectures for consistency.

* **`test_mm_plus_mm3`**: Replaced `expectedFailureDynamicWrapper` with `pytest.mark.xfail( condition=not torch.version.hip,...)`. C++ wrapper dynamic shapes passes on ROCm.

* **`test_triton_autotuning`**: This test was failing on ROCm because it expected a grid value of 32736 for all AMD architectures. Fixed by using 32736 only for MI300 (gfx94x), and 1023 for other architectures including MI350, which matches CUDA behavior. This enables the test to run on ROCm instead of being skipped entirely as it is in [rocm/pytorch](https://github.com/ROCm/pytorch/blob/f742da38826cf947bc52fdcbba7211d6853e0369/test/inductor/test_aot_inductor.py#L6939).

* **`test_triton_mutated_autotuning`**: Applied the same grid value fix as `test_triton_autotuning`.

* all tests in **`test/inductor/test_select_algorithm.py`**: guarded `self.assertEqual(counters["inductor"]["select_algorithm_autotune"], ...)` assertions so they do not run on ROCm, as autotuning behavior is non-deterministic on this platform (candidate prescreening may filter more aggressively based on architecture-specific kernel availability).

Prior version of PR: #172681.

Pull Request resolved: #172780
Approved by: https://github.com/jeffdaily
pytorchmergebot pushed a commit that referenced this pull request Feb 4, 2026
…2780)

This PR fixes ROCm-specific unit test issues:

* **`test_sdpa_autocast`**, **`test_sdpa_backwards`**, and **`test_compile_preserves_metadata_cache`**: Currently skipped on AMD MI200 and MI300. This PR extends the skip to all ROCm architectures for consistency.

This PR will also enable the below DISABLED github issues to be closed:

`test_sdpa_autocast`:
Fixes #173715

`test_sdpa_backwards`:
Fixes #173712
Fixes #173713
Fixes #173714

`test_compile_preserves_metadata_cache`:
Fixes #173717

* **`test_mm_plus_mm3`**: Replaced `expectedFailureDynamicWrapper` with `pytest.mark.xfail( condition=not torch.version.hip,...)`. C++ wrapper dynamic shapes passes on ROCm.

* **`test_triton_autotuning`**: This test was failing on ROCm because it expected a grid value of 32736 for all AMD architectures. Fixed by using 32736 only for MI300 (gfx94x), and 1023 for other architectures including MI350, which matches CUDA behavior. This enables the test to run on ROCm instead of being skipped entirely as it is in [rocm/pytorch](https://github.com/ROCm/pytorch/blob/f742da38826cf947bc52fdcbba7211d6853e0369/test/inductor/test_aot_inductor.py#L6939).

Fixes #173619

* **`test_triton_mutated_autotuning`**: Applied the same grid value fix as `test_triton_autotuning`.

Fixes #173620

* all tests in **`test/inductor/test_select_algorithm.py`**: guarded `self.assertEqual(counters["inductor"]["select_algorithm_autotune"], ...)` assertions so they do not run on ROCm, as autotuning behavior is non-deterministic on this platform (candidate prescreening may filter more aggressively based on architecture-specific kernel availability).

- **`test_copy_non_blocking_is_pinned`**: Observed failures on Navi machines, skipped them while they are being investigated.

- **`test_2d_reduction_odd_shapes`**:  On ROCm (Navi vs MI*), backend scheduling differences can cause one fewer block descriptor than expected. Updated test to allow at most one fewer block descriptor (minimum 1). Finally skipped this on Navi/mi200 due to flakiness in the last part of the test matching BLOCK_R0 and BLOCK_R1 in the generated code. Behavior not consistent across different CI runs.

- **`test_upsample_layout`**: On ROCm, `bfloat16` may use `extern_kernels.convolution` instead of MKLDNN. Updated test to check for `extern_kernels.convolution` when MKLDNN is not present, since `transpose_mxn` is only required for MKLDNN.

Prior version of PR: #172681.

Pull Request resolved: #172780
Approved by: https://github.com/jeffdaily
pytorchmergebot pushed a commit that referenced this pull request Feb 7, 2026
…2780)

This PR fixes ROCm-specific unit test issues:

* **`test_sdpa_autocast`**, **`test_sdpa_backwards`**, and **`test_compile_preserves_metadata_cache`**: Currently skipped on AMD MI200 and MI300. This PR extends the skip to all ROCm architectures for consistency.

This PR will also enable the below DISABLED github issues to be closed:

`test_sdpa_autocast`:
Fixes #173715

`test_sdpa_backwards`:
Fixes #173712
Fixes #173713
Fixes #173714

`test_compile_preserves_metadata_cache`:
Fixes #173717

* **`test_mm_plus_mm3`**: Replaced `expectedFailureDynamicWrapper` with `pytest.mark.xfail( condition=not torch.version.hip,...)`. C++ wrapper dynamic shapes passes on ROCm.

* **`test_triton_autotuning`**: This test was failing on ROCm because it expected a grid value of 32736 for all AMD architectures. Fixed by checking if the grid value is one of the possible values based on the configs. This enables the test to run on ROCm instead of being skipped entirely as it is in [rocm/pytorch](https://github.com/ROCm/pytorch/blob/f742da38826cf947bc52fdcbba7211d6853e0369/test/inductor/test_aot_inductor.py#L6939).

Fixes #173619

* **`test_triton_mutated_autotuning`**: Applied the same grid value fix as `test_triton_autotuning`.

Fixes #173620

* all tests in **`test/inductor/test_select_algorithm.py`**: guarded `self.assertEqual(counters["inductor"]["select_algorithm_autotune"], ...)` assertions so they do not run on ROCm, as autotuning behavior is non-deterministic on this platform (candidate prescreening may filter more aggressively based on architecture-specific kernel availability).

- **`test_copy_non_blocking_is_pinned`**: Observed failures on Navi machines, skipped them while they are being investigated.

- **`test_2d_reduction_odd_shapes`**:  On ROCm (Navi vs MI*), backend scheduling differences can cause one fewer block descriptor than expected. Updated test to allow at most one fewer block descriptor (minimum 1). Finally skipped this on Navi/mi200 due to flakiness in the last part of the test matching BLOCK_R0 and BLOCK_R1 in the generated code. Behavior not consistent across different CI runs.

- **`test_upsample_layout`**: On ROCm, `bfloat16` may use `extern_kernels.convolution` instead of MKLDNN. Updated test to check for `extern_kernels.convolution` when MKLDNN is not present, since `transpose_mxn` is only required for MKLDNN.

Prior version of PR: #172681.

Pull Request resolved: #172780
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
radeksm pushed a commit to radeksm/pytorch that referenced this pull request Feb 20, 2026
…orch#172780)

This PR fixes ROCm-specific unit test issues:

* **`test_sdpa_autocast`**, **`test_sdpa_backwards`**, and **`test_compile_preserves_metadata_cache`**: Currently skipped on AMD MI200 and MI300. This PR extends the skip to all ROCm architectures for consistency.

This PR will also enable the below DISABLED github issues to be closed:

`test_sdpa_autocast`:
Fixes pytorch#173715

`test_sdpa_backwards`:
Fixes pytorch#173712
Fixes pytorch#173713
Fixes pytorch#173714

`test_compile_preserves_metadata_cache`:
Fixes pytorch#173717

* **`test_mm_plus_mm3`**: Replaced `expectedFailureDynamicWrapper` with `pytest.mark.xfail( condition=not torch.version.hip,...)`. C++ wrapper dynamic shapes passes on ROCm.

* **`test_triton_autotuning`**: This test was failing on ROCm because it expected a grid value of 32736 for all AMD architectures. Fixed by using 32736 only for MI300 (gfx94x), and 1023 for other architectures including MI350, which matches CUDA behavior. This enables the test to run on ROCm instead of being skipped entirely as it is in [rocm/pytorch](https://github.com/ROCm/pytorch/blob/f742da38826cf947bc52fdcbba7211d6853e0369/test/inductor/test_aot_inductor.py#L6939).

Fixes pytorch#173619

* **`test_triton_mutated_autotuning`**: Applied the same grid value fix as `test_triton_autotuning`.

Fixes pytorch#173620

* all tests in **`test/inductor/test_select_algorithm.py`**: guarded `self.assertEqual(counters["inductor"]["select_algorithm_autotune"], ...)` assertions so they do not run on ROCm, as autotuning behavior is non-deterministic on this platform (candidate prescreening may filter more aggressively based on architecture-specific kernel availability).

- **`test_copy_non_blocking_is_pinned`**: Observed failures on Navi machines, skipped them while they are being investigated.

- **`test_2d_reduction_odd_shapes`**:  On ROCm (Navi vs MI*), backend scheduling differences can cause one fewer block descriptor than expected. Updated test to allow at most one fewer block descriptor (minimum 1). Finally skipped this on Navi/mi200 due to flakiness in the last part of the test matching BLOCK_R0 and BLOCK_R1 in the generated code. Behavior not consistent across different CI runs.

- **`test_upsample_layout`**: On ROCm, `bfloat16` may use `extern_kernels.convolution` instead of MKLDNN. Updated test to check for `extern_kernels.convolution` when MKLDNN is not present, since `transpose_mxn` is only required for MKLDNN.

Prior version of PR: pytorch#172681.

Pull Request resolved: pytorch#172780
Approved by: https://github.com/jeffdaily
radeksm pushed a commit to radeksm/pytorch that referenced this pull request Feb 20, 2026
…orch#172780)

This PR fixes ROCm-specific unit test issues:

* **`test_sdpa_autocast`**, **`test_sdpa_backwards`**, and **`test_compile_preserves_metadata_cache`**: Currently skipped on AMD MI200 and MI300. This PR extends the skip to all ROCm architectures for consistency.

This PR will also enable the below DISABLED github issues to be closed:

`test_sdpa_autocast`:
Fixes pytorch#173715

`test_sdpa_backwards`:
Fixes pytorch#173712
Fixes pytorch#173713
Fixes pytorch#173714

`test_compile_preserves_metadata_cache`:
Fixes pytorch#173717

* **`test_mm_plus_mm3`**: Replaced `expectedFailureDynamicWrapper` with `pytest.mark.xfail( condition=not torch.version.hip,...)`. C++ wrapper dynamic shapes passes on ROCm.

* **`test_triton_autotuning`**: This test was failing on ROCm because it expected a grid value of 32736 for all AMD architectures. Fixed by checking if the grid value is one of the possible values based on the configs. This enables the test to run on ROCm instead of being skipped entirely as it is in [rocm/pytorch](https://github.com/ROCm/pytorch/blob/f742da38826cf947bc52fdcbba7211d6853e0369/test/inductor/test_aot_inductor.py#L6939).

Fixes pytorch#173619

* **`test_triton_mutated_autotuning`**: Applied the same grid value fix as `test_triton_autotuning`.

Fixes pytorch#173620

* all tests in **`test/inductor/test_select_algorithm.py`**: guarded `self.assertEqual(counters["inductor"]["select_algorithm_autotune"], ...)` assertions so they do not run on ROCm, as autotuning behavior is non-deterministic on this platform (candidate prescreening may filter more aggressively based on architecture-specific kernel availability).

- **`test_copy_non_blocking_is_pinned`**: Observed failures on Navi machines, skipped them while they are being investigated.

- **`test_2d_reduction_odd_shapes`**:  On ROCm (Navi vs MI*), backend scheduling differences can cause one fewer block descriptor than expected. Updated test to allow at most one fewer block descriptor (minimum 1). Finally skipped this on Navi/mi200 due to flakiness in the last part of the test matching BLOCK_R0 and BLOCK_R1 in the generated code. Behavior not consistent across different CI runs.

- **`test_upsample_layout`**: On ROCm, `bfloat16` may use `extern_kernels.convolution` instead of MKLDNN. Updated test to check for `extern_kernels.convolution` when MKLDNN is not present, since `transpose_mxn` is only required for MKLDNN.

Prior version of PR: pytorch#172681.

Pull Request resolved: pytorch#172780
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
libohao1201 pushed a commit to libohao1201/pytorch that referenced this pull request Mar 2, 2026
…orch#172780)

This PR fixes ROCm-specific unit test issues:

* **`test_sdpa_autocast`**, **`test_sdpa_backwards`**, and **`test_compile_preserves_metadata_cache`**: Currently skipped on AMD MI200 and MI300. This PR extends the skip to all ROCm architectures for consistency.

This PR will also enable the below DISABLED github issues to be closed:

`test_sdpa_autocast`:
Fixes pytorch#173715

`test_sdpa_backwards`:
Fixes pytorch#173712
Fixes pytorch#173713
Fixes pytorch#173714

`test_compile_preserves_metadata_cache`:
Fixes pytorch#173717

* **`test_mm_plus_mm3`**: Replaced `expectedFailureDynamicWrapper` with `pytest.mark.xfail( condition=not torch.version.hip,...)`. C++ wrapper dynamic shapes passes on ROCm.

* **`test_triton_autotuning`**: This test was failing on ROCm because it expected a grid value of 32736 for all AMD architectures. Fixed by using 32736 only for MI300 (gfx94x), and 1023 for other architectures including MI350, which matches CUDA behavior. This enables the test to run on ROCm instead of being skipped entirely as it is in [rocm/pytorch](https://github.com/ROCm/pytorch/blob/f742da38826cf947bc52fdcbba7211d6853e0369/test/inductor/test_aot_inductor.py#L6939).

Fixes pytorch#173619

* **`test_triton_mutated_autotuning`**: Applied the same grid value fix as `test_triton_autotuning`.

Fixes pytorch#173620

* all tests in **`test/inductor/test_select_algorithm.py`**: guarded `self.assertEqual(counters["inductor"]["select_algorithm_autotune"], ...)` assertions so they do not run on ROCm, as autotuning behavior is non-deterministic on this platform (candidate prescreening may filter more aggressively based on architecture-specific kernel availability).

- **`test_copy_non_blocking_is_pinned`**: Observed failures on Navi machines, skipped them while they are being investigated.

- **`test_2d_reduction_odd_shapes`**:  On ROCm (Navi vs MI*), backend scheduling differences can cause one fewer block descriptor than expected. Updated test to allow at most one fewer block descriptor (minimum 1). Finally skipped this on Navi/mi200 due to flakiness in the last part of the test matching BLOCK_R0 and BLOCK_R1 in the generated code. Behavior not consistent across different CI runs.

- **`test_upsample_layout`**: On ROCm, `bfloat16` may use `extern_kernels.convolution` instead of MKLDNN. Updated test to check for `extern_kernels.convolution` when MKLDNN is not present, since `transpose_mxn` is only required for MKLDNN.

Prior version of PR: pytorch#172681.

Pull Request resolved: pytorch#172780
Approved by: https://github.com/jeffdaily
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: amp (automated mixed precision) autocast module: cpu CPU specific problem (e.g., perf, algorithm) module: dynamo module: inductor module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration module: rocm AMD GPU support for Pytorch open source release notes: distributed (dtensor) release notes category release notes: quantization release notes category topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants