[Docker Release] Test if pytorch was compiled with CUDA before pushing to repo by atalman · Pull Request #128852 · pytorch/pytorch

atalman · 2024-06-17T17:02:09Z

Related to: #125879
Would check if we are compiled with CUDA before publishing CUDA Docker nightly image

Test

#18 [conda-installs 5/5] RUN IS_CUDA=$(python -c 'import torch ; print(torch.cuda._is_compiled())');    echo "Is torch compiled with cuda: ${IS_CUDA}";     if test "${IS_CUDA}" != "True" -a ! -z "12.4.0"; then 	exit 1;     fi
#18 1.656 Is torch compiled with cuda: False
#18 ERROR: process "/bin/sh -c IS_CUDA=$(python -c 'import torch ; print(torch.cuda._is_compiled())');    echo \"Is torch compiled with cuda: ${IS_CUDA}\";     if test \"${IS_CUDA}\" != \"True\" -a ! -z \"${CUDA_VERSION}\"; then \texit 1;     fi" did not complete successfully: exit code: 1
------
 > [conda-installs 5/5] RUN IS_CUDA=$(python -c 'import torch ; print(torch.cuda._is_compiled())');    echo "Is torch compiled with cuda: ${IS_CUDA}";     if test "${IS_CUDA}" != "True" -a ! -z "12.4.0"; then 	exit 1;     fi:
1.656 Is torch compiled with cuda: False
------
Dockerfile:80
--------------------
  79 |     RUN /opt/conda/bin/pip install torchelastic
  80 | >>> RUN IS_CUDA=$(python -c 'import torch ; print(torch.cuda._is_compiled())');\
  81 | >>>     echo "Is torch compiled with cuda: ${IS_CUDA}"; \
  82 | >>>     if test "${IS_CUDA}" != "True" -a ! -z "${CUDA_VERSION}"; then \
  83 | >>> 	exit 1; \
  84 | >>>     fi
  85 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c IS_CUDA=$(python -c 'import torch ; print(torch.cuda._is_compiled())');    echo \"Is torch compiled with cuda: ${IS_CUDA}\";     if test \"${IS_CUDA}\" != \"True\" -a ! -z \"${CUDA_VERSION}\"; then \texit 1;     fi" did not complete successfully: exit code: 1
(base) [ec2-user@ip-172-30-2-248 pytorch]$ docker buildx build --progress=plain  --platform="linux/amd64"  --target official -t ghcr.io/pytorch/pytorch:2.5.0.dev20240617-cuda12.4-cudnn9-devel --build-arg BASE_IMAGE=nvidia/cuda:12.4.0-devel-ubuntu22.04 --build-arg PYTHON_VERSION=3.11 --build-arg CUDA_VERSION= --build-arg CUDA_CHANNEL=nvidia --build-arg PYTORCH_VERSION=2.5.0.dev20240617 --build-arg INSTALL_CHANNEL=pytorch --build-arg TRITON_VERSION= --build-arg CMAKE_VARS="" .
#0 building with "default" instance using docker driver

Please note looks like we are installing from pytorch rather then nighlty channel on PR hence cuda 12.4 is failing since its not in pytorch channel yet:
https://github.com/pytorch/pytorch/actions/runs/9555354734/job/26338476741?pr=128852

pytorch-bot · 2024-06-17T17:02:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128852

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 758deb0 with merge base a87d82a ():

NEW FAILURES - The following jobs have failed:

Build Official Docker Images / build (12.4, 12.4.0, 9, devel, linux/amd64) (gh)
Process completed with exit code 2.
Build Official Docker Images / build (12.4, 12.4.0, 9, runtime, linux/amd64) (gh)
Process completed with exit code 2.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

test test test test test test test test test test test test test fix test fix

malfet · 2024-06-17T21:44:32Z

Dockerfile

    esac && \
    /opt/conda/bin/conda clean -ya
 RUN /opt/conda/bin/pip install torchelastic
+RUN IS_CUDA=$(python -c 'import torch ; print(torch.cuda._is_compiled())'); \


What if user wants to build CPU docker version? Should we somehow map IS_CUDA to say INSTALL_CHANNEL to figure out whether we want cu or cpu versions?

Done. Added check

if test "${IS_CUDA}" != "True" -a ! -z "${CUDA_VERSION}"; then \ exit 1; \ fi

atalman · 2024-06-17T22:49:34Z

@pytorchmergebot merge -f "lint is green, cuda 12.4 failures are expected"

pytorchmergebot · 2024-06-17T22:51:03Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Add docker test for cuda presence

02674c1

atalman requested a review from a team as a code owner June 17, 2024 17:02

pytorch-bot bot added the topic: not user facing topic category label Jun 17, 2024

test

9cc60d8

test test test test test test test test test test test test test fix test fix

atalman force-pushed the test_cuda_cpu_docker branch from ab75991 to 9cc60d8 Compare June 17, 2024 21:40

atalman changed the title ~~[Docker Release] Test if pytorch was compiled with CUDA before publishing latest tag~~ [Docker Release] Test if pytorch was compiled with CUDA before pushing to repo Jun 17, 2024

malfet reviewed Jun 17, 2024

View reviewed changes

add_cuda

758deb0

malfet approved these changes Jun 17, 2024

View reviewed changes

pytorchmergebot added the merging label Jun 17, 2024

pytorchmergebot added the Merged label Jun 17, 2024

pytorchmergebot closed this in 3b8c9b8 Jun 17, 2024

pytorchmergebot removed the merging label Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docker Release] Test if pytorch was compiled with CUDA before pushing to repo#128852

[Docker Release] Test if pytorch was compiled with CUDA before pushing to repo#128852
atalman wants to merge 3 commits intopytorch:mainfrom
atalman:test_cuda_cpu_docker

atalman commented Jun 17, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 17, 2024 •

edited

Loading

Uh oh!

malfet Jun 17, 2024

Uh oh!

atalman Jun 17, 2024

Uh oh!

atalman commented Jun 17, 2024

Uh oh!

pytorchmergebot commented Jun 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

atalman commented Jun 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128852

❌ 2 New Failures

Uh oh!

malfet Jun 17, 2024

Choose a reason for hiding this comment

Uh oh!

atalman Jun 17, 2024

Choose a reason for hiding this comment

Uh oh!

atalman commented Jun 17, 2024

Uh oh!

pytorchmergebot commented Jun 17, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

atalman commented Jun 17, 2024 •

edited

Loading

pytorch-bot bot commented Jun 17, 2024 •

edited

Loading