Skip to content

Docker release build: Use 13.0.0 nvidia docker#166904

Closed
atalman wants to merge 4 commits intopytorch:mainfrom
atalman:use_cuda_1300_docker
Closed

Docker release build: Use 13.0.0 nvidia docker#166904
atalman wants to merge 4 commits intopytorch:mainfrom
atalman:use_cuda_1300_docker

Conversation

@atalman
Copy link
Contributor

@atalman atalman commented Nov 3, 2025

Forward fix for failing Docker release builds
Related to: #166897

Nightly Docker build failure https://github.com/pytorch/pytorch/actions/runs/18900508440/job/53946606434
Due to missing base image:

ERROR: failed to build: failed to solve: docker.io/nvidia/cuda:13.0.2-devel-ubuntu22.04: not found

@atalman atalman requested a review from a team as a code owner November 3, 2025 22:38
@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Nov 3, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 3, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166904

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 7 New Failures

As of commit 57224a7 with merge base eea8ff2 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@seemethere
Copy link
Member

Wait why do we have to do this? Why can't we just set CUDA_VERSION==13.0.0 by default?

@tinglvv
Copy link
Collaborator

tinglvv commented Nov 4, 2025

Changing the CUDA_VERSION back to 13.0.0 would lead to mismatches between the CUDA version in install_cuda.sh and the version for the nightly binaries (please see https://github.com/pytorch/pytorch/pull/165470/files where the binary matrix is updated at 13.0.2 requirements and driver is updated to 580.95.05. We would need the full revert of the above PR if we would like to change the CUDA_ARCHES_FULL_VERSION back to 13.0.0.

Could we merge 10545e1 as a temp fix to use 13.0.0 nvidia.docker to unblock the nightly docker build for now? Permanent fix to follow in https://github.com/pytorch/pytorch/pull/166907/files. cc @seemethere

@atalman
Copy link
Contributor Author

atalman commented Nov 4, 2025

HI @tinglvv this should work the same CUDA_ARCHES_FULL_VERSION is only used in the .github/scripts/generate_docker_release_matrix.py

@atalman
Copy link
Contributor Author

atalman commented Nov 4, 2025

@pytorchmergebot merge -f "all required workflows are green"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants