Skip to content

Allow ROCm CI to use non-default stream.#48424

Closed
jeffdaily wants to merge 4 commits intopytorch:masterfrom
ROCm:rocm_tests_on_non_default_stream
Closed

Allow ROCm CI to use non-default stream.#48424
jeffdaily wants to merge 4 commits intopytorch:masterfrom
ROCm:rocm_tests_on_non_default_stream

Conversation

@jeffdaily
Copy link
Copy Markdown
Collaborator

@jeffdaily jeffdaily commented Nov 24, 2020

Revert #26394. Fixes #27356. Not all MIOpen handles were setting their stream to the current stream prior to running the op.

@jeffdaily jeffdaily added the module: rocm AMD GPU support for Pytorch label Nov 24, 2020
@jeffdaily
Copy link
Copy Markdown
Collaborator Author

This PR will test if #27356 is still really an issue. It's been a year since that issue was raised.

@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented Nov 24, 2020

💊 CI failures summary and remediations

As of commit 5b01d9f (more details on the Dr. CI page):


  • 3/3 failures introduced in this PR

🕵️ 3 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_build (1/3)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/verbatim-sources/job-specs/pytorch-job-specs.yml 
Auto-merging .circleci/verbatim-sources/job-specs/pytorch-job-specs.yml 
CONFLICT (add/add): Merge conflict in .circleci/docker/build.sh 
Auto-merging .circleci/docker/build.sh 
CONFLICT (add/add): Merge conflict in .circleci/config.yml 
Auto-merging .circleci/config.yml 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/simple/docker_definitions.py 
Auto-merging .circleci/cimodel/data/simple/docker_definitions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_data.py 
Auto-merging .circleci/cimodel/data/pytorch_build_data.py 
Automatic merge failed; fix conflicts and then commit the result. 

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_build (2/3)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/verbatim-sources/job-specs/pytorch-job-specs.yml 
Auto-merging .circleci/verbatim-sources/job-specs/pytorch-job-specs.yml 
CONFLICT (add/add): Merge conflict in .circleci/docker/build.sh 
Auto-merging .circleci/docker/build.sh 
CONFLICT (add/add): Merge conflict in .circleci/config.yml 
Auto-merging .circleci/config.yml 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/simple/docker_definitions.py 
Auto-merging .circleci/cimodel/data/simple/docker_definitions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_data.py 
Auto-merging .circleci/cimodel/data/pytorch_build_data.py 
Automatic merge failed; fix conflicts and then commit the result. 

See CircleCI build pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc5_4_build (3/3)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/verbatim-sources/job-specs/pytorch-job-specs.yml 
Auto-merging .circleci/verbatim-sources/job-specs/pytorch-job-specs.yml 
CONFLICT (add/add): Merge conflict in .circleci/docker/build.sh 
Auto-merging .circleci/docker/build.sh 
CONFLICT (add/add): Merge conflict in .circleci/config.yml 
Auto-merging .circleci/config.yml 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/simple/docker_definitions.py 
Auto-merging .circleci/cimodel/data/simple/docker_definitions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_data.py 
Auto-merging .circleci/cimodel/data/pytorch_build_data.py 
Automatic merge failed; fix conflicts and then commit the result. 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 20 times.

@jeffdaily
Copy link
Copy Markdown
Collaborator Author

@pytorchbot retest this please

@codecov
Copy link
Copy Markdown

codecov Bot commented Nov 24, 2020

Codecov Report

Merging #48424 (2e286e2) into master (3b25af0) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #48424      +/-   ##
==========================================
- Coverage   80.83%   80.83%   -0.01%     
==========================================
  Files        1860     1860              
  Lines      200668   200668              
==========================================
- Hits       162220   162219       -1     
- Misses      38448    38449       +1     

@jeffdaily jeffdaily marked this pull request as ready for review November 24, 2020 23:59
@jeffdaily
Copy link
Copy Markdown
Collaborator Author

@pytorchbot retest this please

@jeffdaily jeffdaily marked this pull request as draft November 25, 2020 00:06
@jeffdaily
Copy link
Copy Markdown
Collaborator Author

@pytorchbot retest this please

@jeffdaily jeffdaily force-pushed the rocm_tests_on_non_default_stream branch from e847edf to 503de05 Compare December 1, 2020 18:45
@jeffdaily jeffdaily force-pushed the rocm_tests_on_non_default_stream branch from 503de05 to 2e286e2 Compare December 2, 2020 17:07
This removes the util function setMIOpenStreamToCurrent() in favor of
always setting the current stream to the MIOpen handle when requested.
@jeffdaily jeffdaily requested a review from sunway513 December 4, 2020 22:08
@jeffdaily
Copy link
Copy Markdown
Collaborator Author

@jithunnair-amd could you also review? Thanks.

@jeffdaily jeffdaily marked this pull request as ready for review December 4, 2020 23:07
@jeffdaily jeffdaily marked this pull request as draft December 4, 2020 23:42
@jeffdaily
Copy link
Copy Markdown
Collaborator Author

CI failures due to upstream, not this PR. Retesting.

@jeffdaily
Copy link
Copy Markdown
Collaborator Author

@pytorchbot retest this please

@jeffdaily jeffdaily marked this pull request as ready for review December 7, 2020 20:44
@jeffdaily jeffdaily requested a review from mruberry December 7, 2020 20:47
@jeffdaily
Copy link
Copy Markdown
Collaborator Author

@mruberry I'm happy to finally resolve this year-old issue. Thanks in advance for your review.

Copy link
Copy Markdown

@sunway513 sunway513 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @jeffdaily.

@mruberry
Copy link
Copy Markdown
Collaborator

mruberry commented Dec 8, 2020

Hey @jeffdaily, this looks OK to me. Do you want to wait for @jithunnair-amd?

@jithunnair-amd
Copy link
Copy Markdown
Collaborator

LGTM too

@zhangguanheng66 zhangguanheng66 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 8, 2020
Copy link
Copy Markdown
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@mruberry merged this pull request in d5c4a80.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@mruberry merged this pull request in d5c4a80.

jeffdaily added a commit to ROCm/pytorch that referenced this pull request Feb 17, 2021
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Revert pytorch#26394. Fixes pytorch#27356.  Not all MIOpen handles were setting their stream to the current stream prior to running the op.

Pull Request resolved: pytorch#48424

Reviewed By: H-Huang

Differential Revision: D25420384

Pulled By: mruberry

fbshipit-source-id: 051683ba9e3d264b71162bd344031a0c58bf6a41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged module: rocm AMD GPU support for Pytorch open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ROCm tests are run on the default stream

7 participants