Use cuda-toolkit for multiple linux cuda packages#174390
Use cuda-toolkit for multiple linux cuda packages#174390atalman wants to merge 3 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/174390
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 1 Cancelled Job, 4 Unrelated FailuresAs of commit 327922e with merge base fbf54b0 ( NEW FAILURES - The following jobs have failed:
CANCELLED JOB - The following job was cancelled. Please retry:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| "12.6": "12.6.3", | ||
| "12.8": "12.8.1", | ||
| "12.9": "12.9.1", | ||
| "13.0": "13.0.0", |
There was a problem hiding this comment.
Looks like most packages actually match 13.0.1 rather then 13.0.0
Should we advance to 13.0.2 ? https://pypi.org/project/cuda-toolkit/13.0.2
There was a problem hiding this comment.
I would split this in two changes then: first to move to 13.0.2 and then switch to this cuda-toolkit package
Last but not least we should mention those updates in 2.11 release notes
There was a problem hiding this comment.
Let first move to cuda-toolkit, this PR is already ready. I can then update to 13.0.2 as followup tomorrow
There was a problem hiding this comment.
I believe we are using 13.0.2 for the binaries https://github.com/pytorch/pytorch/pull/165470/changes and this 13.0.0 was a WAR in place to resolve the docker base image issue in #166904 (which should have been resolved now after #166907 is merged).
fix fix fix fix generate update
e9fee2a to
a9151fe
Compare
|
@claude Can you review this PR |
|
Claude finished @atalman's task in 1m 11s —— View job PR #174390 Review
SummaryThis PR simplifies CUDA dependency management for Linux builds by replacing individual Code ReviewChanges are well-structured:
Observation on CUDA 13.0 cuda-toolkit extras: "cuda-toolkit[nvrtc,cudart,cupti,cufft,curand,cusolver,cusparse,cufile,nvjitlink,nvtx]==13.0.1"While for CUDA 12.x versions, CI Failures - Root CauseAll 56 CI failures are caused by the same issue - the The PR description mentions this depends on pytorch/test-infra#7733, which likely needs to be merged first to publish/enable the RecommendationThis PR is blocked until the dependency PR (pytorch/test-infra#7733) is merged and the Version DiscussionPer the existing review comments:
|
|
@pytorchmergebot merge -f "all lint and binary builds look good" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
## Package Version Comparison Please note: We keep individual packages that are newer that provided in cuda-toolkit Depends on pytorch/test-infra#7733 Fixes: pytorch#163964 ### CUDA 12.6 | Package | Old Version (individual) | New Version (cuda-toolkit 12.6.3) | Match | | --- | --- | --- | --- | | nvidia-cuda-nvrtc-cu12 | 12.6.77 | 12.6.77 | ✅ | | nvidia-cuda-runtime-cu12 | 12.6.77 | 12.6.77 | ✅ | | nvidia-cuda-cupti-cu12 | 12.6.80 | 12.6.80 | ✅ | | nvidia-cufft-cu12 | 11.3.0.4 | 11.3.0.4 | ✅ | | nvidia-curand-cu12 | 10.3.7.77 | 10.3.7.77 | ✅ | | nvidia-cusolver-cu12 | 11.7.1.2 | 11.7.1.2 | ✅ | | nvidia-cusparse-cu12 | 12.5.4.2 | 12.5.4.2 | ✅ | | nvidia-cublas-cu12 | 12.6.4.1 | 12.6.4.1 | ✅ | | nvidia-cufile-cu12 | 1.11.1.6 | 1.11.1.6 | ✅ | | nvidia-nvjitlink-cu12 | 12.6.85 | 12.6.85 | ✅ | | nvidia-nvtx-cu12 | 12.6.77 | 12.6.77 | ✅ | ### CUDA 12.8 | Package | Old Version (individual) | New Version (cuda-toolkit 12.8.1) | Match | | --- | --- | --- | --- | | nvidia-cuda-nvrtc-cu12 | 12.8.93 | 12.8.93 | ✅ | | nvidia-cuda-runtime-cu12 | 12.8.90 | 12.8.90 | ✅ | | nvidia-cuda-cupti-cu12 | 12.8.90 | 12.8.90 | ✅ | | nvidia-cufft-cu12 | 11.3.3.83 | 11.3.3.83 | ✅ | | nvidia-curand-cu12 | 10.3.9.90 | 10.3.9.90 | ✅ | | nvidia-cusolver-cu12 | 11.7.3.90 | 11.7.3.90 | ✅ | | nvidia-cusparse-cu12 | 12.5.8.93 | 12.5.8.93 | ✅ | | nvidia-cublas-cu12 | 12.8.4.1 | 12.8.4.1 | ✅ | | nvidia-cufile-cu12 | 1.13.1.3 | 1.13.1.3 | ✅ | | nvidia-nvjitlink-cu12 | 12.8.93 | 12.8.93 | ✅ | | nvidia-nvtx-cu12 | 12.8.90 | 12.8.90 | ✅ | ### CUDA 12.9 | Package | Old Version (individual) | New Version (cuda-toolkit 12.9.1) | Match | | --- | --- | --- | --- | | nvidia-cuda-nvrtc-cu12 | 12.9.86 | 12.9.86 | ✅ | | nvidia-cuda-runtime-cu12 | 12.9.79 | 12.9.79 | ✅ | | nvidia-cuda-cupti-cu12 | 12.9.79 | 12.9.79 | ✅ | | nvidia-cufft-cu12 | 11.4.1.4 | 11.4.1.4 | ✅ | | nvidia-curand-cu12 | 10.3.10.19 | 10.3.10.19 | ✅ | | nvidia-cusolver-cu12 | 11.7.5.82 | 11.7.5.82 | ✅ | | nvidia-cusparse-cu12 | 12.5.10.65 | 12.5.10.65 | ✅ | | nvidia-cublas-cu12 | 12.9.1.4 | 12.9.1.4 | ✅ | | nvidia-cufile-cu12 | 1.14.1.1 | 1.14.1.1 | ✅ | | nvidia-nvjitlink-cu12 | 12.9.86 | 12.9.86 | ✅ | | nvidia-nvtx-cu12 | 12.9.79 | 12.9.79 | ✅ | ### CUDA 13.0 | Package | Old Version (individual) | cuda-toolkit 13.0.1 Version | Match | | --- | --- | --- | --- | | nvidia-cuda-nvrtc | 13.0.88 | 13.0.88 | ✅ | | nvidia-cuda-runtime | ~13.0.48 | 13.0.88 | ✅ | | nvidia-cuda-cupti | 13.0.85 | 13.0.85 | ✅ | | nvidia-cufft | 12.0.0.61 | 12.0.0.61 | ✅ | | nvidia-curand | 10.4.0.35 | 10.4.0.35 | ✅ | | nvidia-cusolver | 12.0.4.66 | 12.0.4.66 | ✅ | | nvidia-cusparse | 12.6.3.3 | 12.6.3.3 | ✅ | | nvidia-cufile | 1.15.1.6 | 1.15.1.6 | ✅ | | nvidia-nvjitlink | 13.0.88 | 13.0.88 | ✅ | | nvidia-nvtx | 13.0.85 | 13.0.85 | ✅ | Pull Request resolved: pytorch#174390 Approved by: https://github.com/malfet
Package Version Comparison
Please note: We keep individual packages that are newer that provided in cuda-toolkit
Depends on pytorch/test-infra#7733
Fixes: #163964
CUDA 12.6
CUDA 12.8
CUDA 12.9
CUDA 13.0
cc @malfet @nWEIdia @tinglvv @ptrblck @DEKHTIARJonathan