Skip to content

Use cuda-toolkit for multiple linux cuda packages#174390

Closed
atalman wants to merge 3 commits intopytorch:mainfrom
atalman:fix_cuda_pinning
Closed

Use cuda-toolkit for multiple linux cuda packages#174390
atalman wants to merge 3 commits intopytorch:mainfrom
atalman:fix_cuda_pinning

Conversation

@atalman
Copy link
Contributor

@atalman atalman commented Feb 5, 2026

Package Version Comparison

Please note: We keep individual packages that are newer that provided in cuda-toolkit
Depends on pytorch/test-infra#7733
Fixes: #163964

CUDA 12.6

Package Old Version (individual) New Version (cuda-toolkit 12.6.3) Match
nvidia-cuda-nvrtc-cu12 12.6.77 12.6.77
nvidia-cuda-runtime-cu12 12.6.77 12.6.77
nvidia-cuda-cupti-cu12 12.6.80 12.6.80
nvidia-cufft-cu12 11.3.0.4 11.3.0.4
nvidia-curand-cu12 10.3.7.77 10.3.7.77
nvidia-cusolver-cu12 11.7.1.2 11.7.1.2
nvidia-cusparse-cu12 12.5.4.2 12.5.4.2
nvidia-cublas-cu12 12.6.4.1 12.6.4.1
nvidia-cufile-cu12 1.11.1.6 1.11.1.6
nvidia-nvjitlink-cu12 12.6.85 12.6.85
nvidia-nvtx-cu12 12.6.77 12.6.77

CUDA 12.8

Package Old Version (individual) New Version (cuda-toolkit 12.8.1) Match
nvidia-cuda-nvrtc-cu12 12.8.93 12.8.93
nvidia-cuda-runtime-cu12 12.8.90 12.8.90
nvidia-cuda-cupti-cu12 12.8.90 12.8.90
nvidia-cufft-cu12 11.3.3.83 11.3.3.83
nvidia-curand-cu12 10.3.9.90 10.3.9.90
nvidia-cusolver-cu12 11.7.3.90 11.7.3.90
nvidia-cusparse-cu12 12.5.8.93 12.5.8.93
nvidia-cublas-cu12 12.8.4.1 12.8.4.1
nvidia-cufile-cu12 1.13.1.3 1.13.1.3
nvidia-nvjitlink-cu12 12.8.93 12.8.93
nvidia-nvtx-cu12 12.8.90 12.8.90

CUDA 12.9

Package Old Version (individual) New Version (cuda-toolkit 12.9.1) Match
nvidia-cuda-nvrtc-cu12 12.9.86 12.9.86
nvidia-cuda-runtime-cu12 12.9.79 12.9.79
nvidia-cuda-cupti-cu12 12.9.79 12.9.79
nvidia-cufft-cu12 11.4.1.4 11.4.1.4
nvidia-curand-cu12 10.3.10.19 10.3.10.19
nvidia-cusolver-cu12 11.7.5.82 11.7.5.82
nvidia-cusparse-cu12 12.5.10.65 12.5.10.65
nvidia-cublas-cu12 12.9.1.4 12.9.1.4
nvidia-cufile-cu12 1.14.1.1 1.14.1.1
nvidia-nvjitlink-cu12 12.9.86 12.9.86
nvidia-nvtx-cu12 12.9.79 12.9.79

CUDA 13.0

Package Old Version (individual) cuda-toolkit 13.0.1 Version Match
nvidia-cuda-nvrtc 13.0.88 13.0.88
nvidia-cuda-runtime ~13.0.48 13.0.88
nvidia-cuda-cupti 13.0.85 13.0.85
nvidia-cufft 12.0.0.61 12.0.0.61
nvidia-curand 10.4.0.35 10.4.0.35
nvidia-cusolver 12.0.4.66 12.0.4.66
nvidia-cusparse 12.6.3.3 12.6.3.3
nvidia-cufile 1.15.1.6 1.15.1.6
nvidia-nvjitlink 13.0.88 13.0.88
nvidia-nvtx 13.0.85 13.0.85

cc @malfet @nWEIdia @tinglvv @ptrblck @DEKHTIARJonathan

@atalman atalman requested a review from a team as a code owner February 5, 2026 17:06
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 5, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/174390

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Cancelled Job, 4 Unrelated Failures

As of commit 327922e with merge base fbf54b0 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

"12.6": "12.6.3",
"12.8": "12.8.1",
"12.9": "12.9.1",
"13.0": "13.0.0",
Copy link
Contributor Author

@atalman atalman Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like most packages actually match 13.0.1 rather then 13.0.0
Should we advance to 13.0.2 ? https://pypi.org/project/cuda-toolkit/13.0.2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would split this in two changes then: first to move to 13.0.2 and then switch to this cuda-toolkit package
Last but not least we should mention those updates in 2.11 release notes

Copy link
Contributor Author

@atalman atalman Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let first move to cuda-toolkit, this PR is already ready. I can then update to 13.0.2 as followup tomorrow

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we are using 13.0.2 for the binaries https://github.com/pytorch/pytorch/pull/165470/changes and this 13.0.0 was a WAR in place to resolve the docker base image issue in #166904 (which should have been resolved now after #166907 is merged).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tinglvv using 13.0.2

fix

fix

fix

fix

generate

update
@atalman atalman added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Feb 5, 2026
@atalman
Copy link
Contributor Author

atalman commented Feb 6, 2026

@claude Can you review this PR

@claude
Copy link

claude bot commented Feb 6, 2026

Claude finished @atalman's task in 1m 11s —— View job


PR #174390 Review

  • Read changed files and understand the PR
  • Analyze the changes for correctness and potential issues
  • Check CI failures and provide feedback
  • Provide comprehensive review feedback

Summary

This PR simplifies CUDA dependency management for Linux builds by replacing individual nvidia-* packages with the unified cuda-toolkit meta-package. This is a good maintenance improvement that reduces complexity.

Code Review

Changes are well-structured:

  • Replaces ~15 individual nvidia package dependencies with a single cuda-toolkit[...] package per CUDA version
  • Updates CUDA 13.0 from 13.0.0 to 13.0.1 (.github/scripts/generate_binary_build_matrix.py:31)
  • Correctly preserves packages that aren't part of cuda-toolkit: nvidia-cudnn-*, nvidia-cusparselt-*, nvidia-nccl-*, nvidia-nvshmem-*, and cuda-bindings
  • For CUDA 13.0, also keeps nvidia-cublas separate (not included in the cuda-toolkit extras list for 13.0)

Observation on CUDA 13.0 cuda-toolkit extras:
I noticed that for CUDA 13.0 (line 77), cublas is not included in the cuda-toolkit extras list:

"cuda-toolkit[nvrtc,cudart,cupti,cufft,curand,cusolver,cusparse,cufile,nvjitlink,nvtx]==13.0.1"

While for CUDA 12.x versions, cublas is included. This is intentional (as nvidia-cublas==13.1.0.3 is listed separately), but worth confirming this is expected behavior.

CI Failures - Root Cause

All 56 CI failures are caused by the same issue - the cuda-toolkit package is not available on PyPI yet:

ERROR: Could not find a version that satisfies the requirement cuda-toolkit==12.6.3; platform_system == "Linux" (from torch)
ERROR: No matching distribution found for cuda-toolkit==12.6.3; platform_system == "Linux"

The PR description mentions this depends on pytorch/test-infra#7733, which likely needs to be merged first to publish/enable the cuda-toolkit package on the PyTorch index.

Recommendation

This PR is blocked until the dependency PR (pytorch/test-infra#7733) is merged and the cuda-toolkit package is available. Once that's done, the CI should pass.

Version Discussion

Per the existing review comments:


atalman added a commit to pytorch/test-infra that referenced this pull request Feb 6, 2026
@atalman
Copy link
Contributor Author

atalman commented Feb 6, 2026

@pytorchmergebot merge -f "all lint and binary builds look good"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@huydhn huydhn mentioned this pull request Feb 12, 2026
radeksm pushed a commit to radeksm/pytorch that referenced this pull request Feb 20, 2026
## Package Version Comparison

Please note: We keep individual packages that are newer that provided in cuda-toolkit
Depends on pytorch/test-infra#7733
Fixes: pytorch#163964

  ### CUDA 12.6

  | Package | Old Version (individual) | New Version (cuda-toolkit 12.6.3) | Match |
  | --- | --- | --- | --- |
  | nvidia-cuda-nvrtc-cu12 | 12.6.77 | 12.6.77 | ✅ |
  | nvidia-cuda-runtime-cu12 | 12.6.77 | 12.6.77 | ✅ |
  | nvidia-cuda-cupti-cu12 | 12.6.80 | 12.6.80 | ✅ |
  | nvidia-cufft-cu12 | 11.3.0.4 | 11.3.0.4 | ✅ |
  | nvidia-curand-cu12 | 10.3.7.77 | 10.3.7.77 | ✅ |
  | nvidia-cusolver-cu12 | 11.7.1.2 | 11.7.1.2 | ✅ |
  | nvidia-cusparse-cu12 | 12.5.4.2 | 12.5.4.2 | ✅ |
  | nvidia-cublas-cu12 | 12.6.4.1 | 12.6.4.1 | ✅ |
  | nvidia-cufile-cu12 | 1.11.1.6 | 1.11.1.6 | ✅ |
  | nvidia-nvjitlink-cu12 | 12.6.85 | 12.6.85 | ✅ |
  | nvidia-nvtx-cu12 | 12.6.77 | 12.6.77 | ✅ |

  ### CUDA 12.8

  | Package | Old Version (individual) | New Version (cuda-toolkit 12.8.1) | Match |
  | --- | --- | --- | --- |
  | nvidia-cuda-nvrtc-cu12 | 12.8.93 | 12.8.93 | ✅ |
  | nvidia-cuda-runtime-cu12 | 12.8.90 | 12.8.90 | ✅ |
  | nvidia-cuda-cupti-cu12 | 12.8.90 | 12.8.90 | ✅ |
  | nvidia-cufft-cu12 | 11.3.3.83 | 11.3.3.83 | ✅ |
  | nvidia-curand-cu12 | 10.3.9.90 | 10.3.9.90 | ✅ |
  | nvidia-cusolver-cu12 | 11.7.3.90 | 11.7.3.90 | ✅ |
  | nvidia-cusparse-cu12 | 12.5.8.93 | 12.5.8.93 | ✅ |
  | nvidia-cublas-cu12 | 12.8.4.1 | 12.8.4.1 | ✅ |
  | nvidia-cufile-cu12 | 1.13.1.3 | 1.13.1.3 | ✅ |
  | nvidia-nvjitlink-cu12 | 12.8.93 | 12.8.93 | ✅ |
  | nvidia-nvtx-cu12 | 12.8.90 | 12.8.90 | ✅ |

  ### CUDA 12.9

  | Package | Old Version (individual)  | New Version (cuda-toolkit 12.9.1) | Match |
  | --- | --- | --- | --- |
  | nvidia-cuda-nvrtc-cu12 | 12.9.86 | 12.9.86 | ✅ |
  | nvidia-cuda-runtime-cu12 | 12.9.79 | 12.9.79 | ✅ |
  | nvidia-cuda-cupti-cu12 | 12.9.79 | 12.9.79 | ✅ |
  | nvidia-cufft-cu12 | 11.4.1.4 | 11.4.1.4 | ✅ |
  | nvidia-curand-cu12 | 10.3.10.19 | 10.3.10.19 | ✅ |
  | nvidia-cusolver-cu12 | 11.7.5.82 | 11.7.5.82 | ✅ |
  | nvidia-cusparse-cu12 | 12.5.10.65 | 12.5.10.65 | ✅ |
  | nvidia-cublas-cu12 | 12.9.1.4 | 12.9.1.4 | ✅ |
  | nvidia-cufile-cu12 | 1.14.1.1 | 1.14.1.1 | ✅ |
  | nvidia-nvjitlink-cu12 | 12.9.86 | 12.9.86 | ✅ |
  | nvidia-nvtx-cu12 | 12.9.79 | 12.9.79 | ✅ |

  ### CUDA 13.0

  | Package | Old Version (individual)  | cuda-toolkit 13.0.1 Version | Match |
  | --- | --- | --- | --- |
  | nvidia-cuda-nvrtc | 13.0.88 | 13.0.88 | ✅ |
  | nvidia-cuda-runtime | ~13.0.48 | 13.0.88 | ✅ |
  | nvidia-cuda-cupti | 13.0.85 | 13.0.85 | ✅ |
  | nvidia-cufft | 12.0.0.61 | 12.0.0.61 | ✅ |
  | nvidia-curand | 10.4.0.35 | 10.4.0.35 | ✅ |
  | nvidia-cusolver | 12.0.4.66 | 12.0.4.66 | ✅ |
  | nvidia-cusparse | 12.6.3.3 | 12.6.3.3 | ✅ |
  | nvidia-cufile | 1.15.1.6 | 1.15.1.6 | ✅ |
  | nvidia-nvjitlink | 13.0.88 | 13.0.88 | ✅ |
  | nvidia-nvtx | 13.0.85 | 13.0.85 | ✅ |

Pull Request resolved: pytorch#174390
Approved by: https://github.com/malfet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries Trigger all binary build and upload jobs on the PR Merged topic: binaries topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BE] use cuda-toolkit metapackage to specify pypi dependencies

4 participants