Skip to content

Fix cuda dependency not found#8903

Merged
zpcore merged 20 commits intomasterfrom
piz/fix_cuda_again
Apr 1, 2025
Merged

Fix cuda dependency not found#8903
zpcore merged 20 commits intomasterfrom
piz/fix_cuda_again

Conversation

@zpcore
Copy link
Copy Markdown
Member

@zpcore zpcore commented Mar 28, 2025

We are trying to align with PyTorch to support CUDA 11.8, 12.6 and 12.8 for 2.7 release.

However, 12.8 only shows up in debian12: https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/. Originally we use debian11 for docker image the and here is the support list: https://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/.

This PR update the docker image to debian12 and the corresponding CUDA library.

@zpcore zpcore marked this pull request as ready for review March 31, 2025 20:26
@zpcore zpcore requested review from ManfeiBai and lsy323 as code owners March 31, 2025 20:26
@zpcore zpcore requested a review from ysiraichi March 31, 2025 20:27
@zpcore
Copy link
Copy Markdown
Member Author

zpcore commented Mar 31, 2025

Hi @ysiraichi , I updated to debian12 as you suggested. Now the cuda version can be discovered. Thanks

@ysiraichi
Copy link
Copy Markdown
Collaborator

Could you add a PR description with more details on what's the problem it's solving? And why this works?
It's good if we wonder why we did this in the future.

@zpcore zpcore changed the title fix cuda 12.3 dependency Fix cuda dependency not found Mar 31, 2025
"nightly": ["11.8", "12.1", "12.6", "12.8"],
"r2.7": ["11.8", "12.6", "12.8"] # align with PyTorch 2.7 release
"r2.7": ["11.8", "12.1", "12.6", "12.8"] # PyTorch 2.7 release only needs 11.8, 12.6, 12.8
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need 12.1 here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our XLML test and torchbench is using the 12.1. Added in the build in case those test needs it.

Copy link
Copy Markdown
Collaborator

@ManfeiBai ManfeiBai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM

@zpcore zpcore force-pushed the piz/fix_cuda_again branch from 3aeafe7 to 2925e80 Compare March 31, 2025 23:18
@zpcore
Copy link
Copy Markdown
Member Author

zpcore commented Apr 1, 2025

I think I will revert to debian11 for now. Update to debian12 is too risky not to mention the dependency is still failing.

@zpcore zpcore merged commit dbb03aa into master Apr 1, 2025
20 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants