Skip to content

Fix cuda Manylinux 2_28 docker images PATH setting#139631

Closed
atalman wants to merge 1 commit intopytorch:mainfrom
atalman:fix_manylinux_228_docker
Closed

Fix cuda Manylinux 2_28 docker images PATH setting#139631
atalman wants to merge 1 commit intopytorch:mainfrom
atalman:fix_manylinux_228_docker

Conversation

@atalman
Copy link
Contributor

@atalman atalman commented Nov 4, 2024

Enabling Manywheel builds here: #138732

During the build I observe the failure with cuda jobs:

-- Compiler does not support SVE extension. Will not build perfkernels.
-- Found CUDA: /usr/local/cuda (found version "11.8") 
-- The CUDA compiler identification is unknown
CMake Error at cmake/public/cuda.cmake:47 (enable_language):
  No CMAKE_CUDA_COMPILER could be found.

  Tell CMake where to find the compiler by setting either the environment
  variable "CUDACXX" or the CMake cache entry CMAKE_CUDA_COMPILER to the full
  path to the compiler, or to the compiler name if it is in the PATH.
Call Stack (most recent call first):
  cmake/Dependencies.cmake:44 (include)
  CMakeLists.txt:851 (include)

While correct sequence suppose to be:

-- Found CUDA: /usr/local/cuda (found version "11.8") 
-- The CUDA compiler identification is NVIDIA 11.8.89
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /usr/local/cuda/include (found version "11.8.89") 

Issue found to be missing PATH setting in 2_28 Docker file. This section exist in CentOS Docker file here:
https://github.com/pytorch/pytorch/blob/main/.ci/docker/manywheel/Dockerfile#L174-L175

(Please Note these Docker images are not used yet. The #138732 should enable using these images)

@atalman atalman requested a review from jeffdaily as a code owner November 4, 2024 16:29
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 4, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139631

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ebfabf4 with merge base 87404b6 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Nov 4, 2024
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just curious why this is not the part of common/install_cuda.sh

@atalman
Copy link
Contributor Author

atalman commented Nov 4, 2024

@pytorchmergebot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 4, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants