Skip to content

Puts ROCm tests on default stream#26394

Closed
mruberry wants to merge 3 commits intomasterfrom
rocm_test_hang
Closed

Puts ROCm tests on default stream#26394
mruberry wants to merge 3 commits intomasterfrom
rocm_test_hang

Conversation

@mruberry
Copy link
Copy Markdown
Collaborator

@mruberry mruberry commented Sep 18, 2019

This PR has been updated. Since ORIGINAL PR comment below.

ROCm CI builds have been hanging as we've been refactoring tests, even when these refactors seem entirely innocuous. This PR started by commenting out test_stft, for example, a Python test never run on ROCm, and that was sufficient to reliably hang the ROCm build in CI.

Putting ROCm tests back on the default stream appears to remove this hang. So this PR now does that. This is likely to unblock development.

ORIGINAL: Some test changes appear to be causing ROCm builds to hang in CI. This PR is an attempt to diagnose the source of the hang.

@pytorchbot pytorchbot added the module: cuda Related to torch.cuda, and CUDA support in general label Sep 18, 2019
@pytorchbot pytorchbot added the module: tests Issues related to tests (not the torch.testing module) label Sep 18, 2019
@bddppq
Copy link
Copy Markdown
Contributor

bddppq commented Sep 18, 2019

I suspect this is related to some of the recently enabled tests
#25963
#26055

cc @iotamudelta

@mruberry
Copy link
Copy Markdown
Collaborator Author

Could be. I just kicked off a run with ROCm on the default stream, too, since that was changed recently.

@mruberry
Copy link
Copy Markdown
Collaborator Author

Putting ROCm back on default stream appears to address the hang.

@ezyang

I'm going to update the PR to just make that change.

@mruberry mruberry requested review from bddppq and ezyang September 18, 2019 17:41
@mruberry mruberry changed the title [TEST] Comments out test_stft Puts ROCm tests on default stream Sep 18, 2019
Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@mruberry merged this pull request in a561660.

facebook-github-bot pushed a commit that referenced this pull request Sep 18, 2019
… test classes (#26375)

Summary:
- Adds dtypes, dtypesIfCPU, and dtypesIfCUDA decorators.
- Eliminates the need for nontest members to be defined in an inherited base.
- Updates one test to use the decorators and updates TestTorchDeviceType with helpers.

This PR appears to be hanging the ROCm build, which is not entirely surprising. See #26394, which demonstrates that the ROCm build can be hung by commenting out a Python test that was never run on ROCm.

gchanan - what type list, if any, do you want to expose? I imagine most test suites will define their own lists like today. SCALAR_TYPES, QUANTIZED_TYPES, and ALL_TYPES seem reasonable to me. DOCUMENTED_TENSOR_TYPES will be removed, of course.
Pull Request resolved: #26375

Test Plan: Edit is to tests themselves.

Differential Revision: D17462294

Pulled By: mruberry

fbshipit-source-id: f8259ec66709749b1bf8077efc737676af901436
jeffdaily added a commit to ROCm/pytorch that referenced this pull request Dec 2, 2020
facebook-github-bot pushed a commit that referenced this pull request Dec 10, 2020
Summary:
Revert #26394. Fixes #27356.  Not all MIOpen handles were setting their stream to the current stream prior to running the op.

Pull Request resolved: #48424

Reviewed By: H-Huang

Differential Revision: D25420384

Pulled By: mruberry

fbshipit-source-id: 051683ba9e3d264b71162bd344031a0c58bf6a41
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
This PR has been updated. Since ORIGINAL PR comment below.

ROCm CI builds have been hanging as we've been refactoring tests, even when these refactors seem entirely innocuous. This PR started by commenting out test_stft, for example, a Python test never run on ROCm, and that was sufficient to reliably hang the ROCm build in CI.

Putting ROCm tests back on the default stream appears to remove this hang. So this PR now does that. This is likely to unblock development.

ORIGINAL: Some test changes appear to be causing ROCm builds to hang in CI. This PR is an attempt to diagnose the source of the hang.
Pull Request resolved: pytorch#26394

Test Plan: Change is to test themselves.

Differential Revision: D17456678

Pulled By: mruberry

fbshipit-source-id: 38d00d01c64b5055c1dfed01687ce3e1c9372887
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
… test classes (pytorch#26375)

Summary:
- Adds dtypes, dtypesIfCPU, and dtypesIfCUDA decorators.
- Eliminates the need for nontest members to be defined in an inherited base.
- Updates one test to use the decorators and updates TestTorchDeviceType with helpers.

This PR appears to be hanging the ROCm build, which is not entirely surprising. See pytorch#26394, which demonstrates that the ROCm build can be hung by commenting out a Python test that was never run on ROCm.

gchanan - what type list, if any, do you want to expose? I imagine most test suites will define their own lists like today. SCALAR_TYPES, QUANTIZED_TYPES, and ALL_TYPES seem reasonable to me. DOCUMENTED_TENSOR_TYPES will be removed, of course.
Pull Request resolved: pytorch#26375

Test Plan: Edit is to tests themselves.

Differential Revision: D17462294

Pulled By: mruberry

fbshipit-source-id: f8259ec66709749b1bf8077efc737676af901436
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Revert pytorch#26394. Fixes pytorch#27356.  Not all MIOpen handles were setting their stream to the current stream prior to running the op.

Pull Request resolved: pytorch#48424

Reviewed By: H-Huang

Differential Revision: D25420384

Pulled By: mruberry

fbshipit-source-id: 051683ba9e3d264b71162bd344031a0c58bf6a41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: cuda Related to torch.cuda, and CUDA support in general module: tests Issues related to tests (not the torch.testing module)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants