Skip to content

Advance nightly docker to 11.6#86941

Closed
atalman wants to merge 5 commits intopytorch:masterfrom
atalman:test_cuda_116_doker
Closed

Advance nightly docker to 11.6#86941
atalman wants to merge 5 commits intopytorch:masterfrom
atalman:test_cuda_116_doker

Conversation

@atalman
Copy link
Contributor

@atalman atalman commented Oct 13, 2022

Fixes following:
https://github.com/pytorch/pytorch/actions/runs/3242695506/jobs/5316334351
crash in Docker builds introduced by: #82682

The PR seems to introduce some changes not compatible with cuda 11.3 which is used by our Docker builds

@atalman atalman requested a review from a team as a code owner October 13, 2022 21:22
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 13, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86941

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 Failures, 1 Pending

As of commit ccdbb27:

The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Oct 13, 2022
Dockerfile Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does pytorch conda channel contain pytorch-cuda packages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 channels should contain the right packages:

ARG CUDA_CHANNEL=nvidia
ARG INSTALL_CHANNEL=pytorch-nightly

Nightly contains pytorch-cuda
Pytorch not yet

@atalman atalman force-pushed the test_cuda_116_doker branch 2 times, most recently from b93eb95 to 2cbb202 Compare October 14, 2022 00:29
Docker builds fix

Fix docker file

Testing

Advance Docker builds to cuda 11.6

Fix typo

Fix typo

Fix cuda version

test

Tune substitution

test

test fix

test

test

test

test

testing
@atalman atalman force-pushed the test_cuda_116_doker branch from 33bcc85 to 7302d2f Compare October 14, 2022 01:11
@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 14, 2022
docker.Makefile Outdated
CUDA_CHANNEL = nvidia
# The conda channel to use to install pytorch / torchvision
INSTALL_CHANNEL ?= pytorch
INSTALL_CHANNEL ?= pytorch-nightly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait we shouldn't change the defaults for this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be users who rely on the default behavior so changing this could be very problematic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with pytorch channel is that it does not contain pytorch-cuda yet. I think publishing pytorch-cuda package to pytorch channel should fix this issue.

docker.Makefile Outdated
CUDA_CHANNEL = nvidia
# The conda channel to use to install pytorch / torchvision
INSTALL_CHANNEL ?= pytorch
INSTALL_CHANNEL ?= pytorch-nightly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be users who rely on the default behavior so changing this could be very problematic

@atalman
Copy link
Contributor Author

atalman commented Oct 17, 2022

Please note current failure of Build Official Docker Image is because of this issue:

ContinuumIO/anaconda-issues#13073

@atalman
Copy link
Contributor Author

atalman commented Oct 19, 2022

@pytorchmergebot merge -f "some build failures expected"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@atalman
Copy link
Contributor Author

atalman commented Oct 20, 2022

@pytorchmergebot revert -c "Workflow is passing but installs CUDA 11.3 PyTorch rather then 11.6"

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 20, 2022

❌ 🤖 pytorchbot command failed:

@pytorchbot revert: error: argument -c/--classification: invalid choice: 'Workflow is passing but installs CUDA 11.3 PyTorch rather then 11.6' (choose from 'nosignal', 'ignoredsignal', 'landrace', 'weird', 'ghfirst')

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Try @pytorchbot --help for more info.

@atalman
Copy link
Contributor Author

atalman commented Oct 20, 2022

@pytorchmergebot revert -m "Workflow is passing but installs CUDA 11.3 PyTorch rather then 11.6" -c weird

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@atalman your PR has been successfully reverted.

pytorchmergebot added a commit that referenced this pull request Oct 20, 2022
This reverts commit c5de535.

Reverted #86941 on behalf of https://github.com/atalman due to Workflow is passing but installs CUDA 11.3 PyTorch rather then 11.6
@atalman atalman reopened this Oct 27, 2022
@atalman atalman closed this Oct 27, 2022
pytorchmergebot pushed a commit that referenced this pull request Oct 28, 2022
Fixes following:
https://github.com/pytorch/pytorch/actions/runs/3242695506/jobs/5316334351
crash in Docker builds introduced by: #82682

The PR seems to introduce some changes not compatible with cuda 11.3 which is used by our Docker builds

This is a reland of original pr: #86941 (Created this new PR to start fresh)
Which was reverted because conda install, installed wrong version of pytorch. It installed pytorch for cuda 11.3 still rather then 11.6

This should be fixed now with Release 1.13
Pull Request resolved: #87858
Approved by: https://github.com/seemethere, https://github.com/malfet, https://github.com/izaitsevfb
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Nov 5, 2022
Fixes following:
https://github.com/pytorch/pytorch/actions/runs/3242695506/jobs/5316334351
crash in Docker builds introduced by: pytorch#82682

The PR seems to introduce some changes not compatible with cuda 11.3 which is used by our Docker builds

This is a reland of original pr: pytorch#86941 (Created this new PR to start fresh)
Which was reverted because conda install, installed wrong version of pytorch. It installed pytorch for cuda 11.3 still rather then 11.6

This should be fixed now with Release 1.13
Pull Request resolved: pytorch#87858
Approved by: https://github.com/seemethere, https://github.com/malfet, https://github.com/izaitsevfb
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
Fixes following:
https://github.com/pytorch/pytorch/actions/runs/3242695506/jobs/5316334351
crash in Docker builds introduced by: pytorch#82682

The PR seems to introduce some changes not compatible with cuda 11.3 which is used by our Docker builds

This is a reland of original pr: pytorch#86941 (Created this new PR to start fresh)
Which was reverted because conda install, installed wrong version of pytorch. It installed pytorch for cuda 11.3 still rather then 11.6

This should be fixed now with Release 1.13
Pull Request resolved: pytorch#87858
Approved by: https://github.com/seemethere, https://github.com/malfet, https://github.com/izaitsevfb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged Reverted topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants