Add CUDA 12.6 Linux Builds to Binaries Matrix by tinglvv · Pull Request #138899 · pytorch/pytorch

tinglvv · 2024-10-25T06:54:09Z

Related to #138440

Issue tracker: #138609

Version based on https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

cc @albanD @ptrblck @atalman @malfet @nWEIdia

pytorch-bot · 2024-10-25T06:54:12Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138899

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit d55064c with merge base ea0f60e ():

NEW FAILURE - The following job has failed:

Lint / pr-sanity-checks (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

malfet

Why are we adding a new flavor? Let's delete something (for example 12.1)

.github/scripts/generate_binary_build_matrix.py

tinglvv · 2024-10-25T20:57:00Z

Removing 12.1 for the nightly binary build per suggestion.
CI/docker images will be deprecated at a later stage.

tinglvv · 2024-10-29T22:15:10Z

Not sure if we should remove 12.1 from LINUX_BINARY_SMOKE_WORKFLOWS, removing temporarily due to the below error

tingl@tingl-mlt pytorch % sh .github/regenerate.sh 
Traceback (most recent call last):
  File "/Users/tingl/Documents/github/pytorch/.github/scripts/generate_ci_workflows.py", line 177, in <module>
    build_configs=generate_binary_build_matrix.generate_wheels_matrix(
  File "/Users/tingl/Documents/github/pytorch/.github/scripts/generate_binary_build_matrix.py", line 471, in generate_wheels_matrix
    "container_image": WHEEL_CONTAINER_IMAGES[arch_version],
KeyError: '12.1'

Skylion007 · 2024-11-03T17:27:12Z

.github/scripts/generate_binary_build_matrix.py

+        "nvidia-cuda-nvrtc-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | "
+        "nvidia-cuda-runtime-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | "
+        "nvidia-cuda-cupti-cu12==12.6.80; platform_system == 'Linux' and platform_machine == 'x86_64' | "
+        "nvidia-cudnn-cu12==9.1.0.70; platform_system == 'Linux' and platform_machine == 'x86_64' | "


This might be a good time to update CUDNN as well anyway?

No, let's not mix different updates (CUDA and cuDNN) into the same PR, but follow up separately.

.github/scripts/generate_binary_build_matrix.py

atalman

Add an exception in generate_conda_matrix to not include any 12.6 builds. We don't want to add new conda builds for 12.6

.github/templates/common.yml.j2

tinglvv · 2024-11-08T19:54:03Z

Error for windows-binary-wheel might be due to #138458 which set 12.4 as default

tinglvv · 2024-11-08T19:55:32Z

linux aarch64 failures should be resolved after correcting build script for aarch64.
windows-conda-build fails with
Run actions/upload-artifact@v4.4.0 Error: No files were found with the provided path: C:\actions-runner\_work\_temp/artifacts. No artifacts will be uploaded

tinglvv · 2024-11-08T21:55:09Z

@pytorchbot rebase

pytorchmergebot · 2024-11-08T21:56:39Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-11-08T21:56:40Z

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/138899/head returned non-zero exit code 1

Rebasing (1/16)
Rebasing (2/16)
Rebasing (3/16)
Rebasing (4/16)
Rebasing (5/16)
Rebasing (6/16)
Auto-merging .github/workflows/generated-linux-binary-conda-nightly.yml
Auto-merging .github/workflows/generated-linux-binary-libtorch-cxx11-abi-nightly.yml
Auto-merging .github/workflows/generated-linux-binary-libtorch-pre-cxx11-nightly.yml
Auto-merging .github/workflows/generated-linux-binary-manywheel-main.yml
CONFLICT (content): Merge conflict in .github/workflows/generated-linux-binary-manywheel-main.yml
Auto-merging .github/workflows/generated-linux-binary-manywheel-nightly.yml
CONFLICT (content): Merge conflict in .github/workflows/generated-linux-binary-manywheel-nightly.yml
Auto-merging .github/workflows/generated-windows-binary-conda-nightly.yml
Auto-merging .github/workflows/generated-windows-binary-libtorch-debug-main.yml
Auto-merging .github/workflows/generated-windows-binary-libtorch-debug-nightly.yml
Auto-merging .github/workflows/generated-windows-binary-libtorch-release-main.yml
Auto-merging .github/workflows/generated-windows-binary-libtorch-release-nightly.yml
Auto-merging .github/workflows/generated-windows-binary-wheel-nightly.yml
error: could not apply 991a7019318... remove 12.1 from LINUX_BINARY_SMOKE_WORKFLOWS
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Could not apply 991a7019318... remove 12.1 from LINUX_BINARY_SMOKE_WORKFLOWS

Raised by https://github.com/pytorch/pytorch/actions/runs/11750112212

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

nWEIdia · 2024-11-12T18:51:57Z

.github/scripts/generate_binary_build_matrix.py

+        "nvidia-curand-cu12==10.3.7.77; platform_system == 'Linux' and platform_machine == 'x86_64' | "
+        "nvidia-cusolver-cu12==11.7.1.2; platform_system == 'Linux' and platform_machine == 'x86_64' | "
+        "nvidia-cusparse-cu12==12.5.4.2; platform_system == 'Linux' and platform_machine == 'x86_64' | "
+        "nvidia-cusparselt-cu12==0.6.3; platform_system == 'Linux' and platform_machine == 'x86_64' | "


We seem to be bumping cusparselt here as well. Watch for unit test failures that https://hud.pytorch.org/pytorch/pytorch/pull/138175 is currently facing.

nWEIdia · 2024-11-12T18:54:33Z

.github/scripts/generate_binary_build_matrix.py

                )
-                # Special build building to use on Colab. Python 3.11 for 12.1 CUDA
-                if python_version == "3.11" and arch_version == "12.1":
+                # Special build building to use on Colab. Python 3.11 for 12.4 CUDA


This seems to be dependent on what Colab's support matrix is, e.g. does it support CUDA 12.4?
It may does support it, but it would be good to double check.

atalman · 2024-11-12T19:50:52Z

@pytorchmergebot merge -f "lint failure is expected"

pytorchmergebot · 2024-11-12T19:52:21Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fixes Lint after: #138899 Due to landrace. Run ``./regenerate.sh`` Pull Request resolved: #140446 Approved by: https://github.com/wdvr, https://github.com/huydhn, https://github.com/seemethere, https://github.com/malfet

Summary: X-link: facebookresearch/FBGEMM#486 - Upgrade gcc version to support newer libstdc++, which is required now that pytorch/pytorch#141035 has landed - Deprecate support for CUDA 12.1 and add support for 12.6, per changes in pytorch/pytorch#138899 Pull Request resolved: #3398 Reviewed By: sryap Differential Revision: D66277492 Pulled By: q10 fbshipit-source-id: 24817efb5c07c1985ab3beeb1610879edbd81acc

johnnynunez · 2024-12-03T09:12:35Z

which version finally? 12.6, 12.6.2 or 12.6.3?
In CES 2025, rtx50, rtx mobile and maybe nvidia arm will be released, so it expects at always that this month will be released cuda 12.7 (December) and with the new ones hardware will be released 12.8

tinglvv · 2024-12-03T16:52:39Z

Hi @johnnynunez

which version finally? 12.6, 12.6.2 or 12.6.3? In CES 2025, rtx50, rtx mobile and maybe nvidia arm will be released, so it expects at always that this month will be released cuda 12.7 (December) and with the new ones hardware will be released 12.8

for x86 nightly build, it is 12.6.3 now - #141433. For windows builds, it is 12..6.2 as windows AMI takes time to build and may not make it before 2.6.0 code freeze. cc @atalman

Related to pytorch#138440 Issue tracker: pytorch#138609 Version based on https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html Pull Request resolved: pytorch#138899 Approved by: https://github.com/atalman Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

Fixes Lint after: pytorch#138899 Due to landrace. Run ``./regenerate.sh`` Pull Request resolved: pytorch#140446 Approved by: https://github.com/wdvr, https://github.com/huydhn, https://github.com/seemethere, https://github.com/malfet

pytorch-bot bot added the topic: not user facing topic category label Oct 25, 2024

pytorchbot added the open source label Oct 25, 2024

malfet reviewed Oct 25, 2024

View reviewed changes

.github/scripts/generate_binary_build_matrix.py Outdated Show resolved Hide resolved

malfet reviewed Oct 25, 2024

View reviewed changes

.github/scripts/generate_binary_build_matrix.py Show resolved Hide resolved

malfet reviewed Oct 25, 2024

View reviewed changes

.github/scripts/generate_binary_build_matrix.py Outdated Show resolved Hide resolved

malfet reviewed Oct 25, 2024

View reviewed changes

.github/scripts/generate_binary_build_matrix.py Outdated Show resolved Hide resolved

tinglvv added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Oct 25, 2024

tinglvv marked this pull request as ready for review October 29, 2024 22:06

tinglvv requested a review from a team as a code owner October 29, 2024 22:06

tinglvv marked this pull request as draft October 29, 2024 22:19

Skylion007 reviewed Nov 3, 2024

View reviewed changes

atalman reviewed Nov 5, 2024

View reviewed changes

.github/scripts/generate_binary_build_matrix.py Show resolved Hide resolved

atalman reviewed Nov 5, 2024

View reviewed changes

atalman reviewed Nov 6, 2024

View reviewed changes

.github/templates/common.yml.j2 Outdated Show resolved Hide resolved

add cuda 12.6 to manywheel dockers

0970503

tinglvv marked this pull request as ready for review November 8, 2024 21:55

tinglvv force-pushed the cuda-12.6-ci branch from a1bb6e2 to 6c52452 Compare November 8, 2024 22:02

tinglvv and others added 4 commits November 8, 2024 15:51

Merge branch 'main' of https://github.com/tinglvv/pytorch

1df6693

add cuda 12.6 to ci

48b4ea8

Update .github/scripts/generate_binary_build_matrix.py

01f6bd3

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

Update .github/scripts/generate_binary_build_matrix.py

562b513

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

tinglvv changed the title ~~Add CUDA 12.6 to Binaries Matrix~~ Add CUDA 12.6 Linux Builds to Binaries Matrix Nov 12, 2024

nWEIdia reviewed Nov 12, 2024

View reviewed changes

pytorchmergebot added the merging label Nov 12, 2024

pytorchmergebot closed this in 14bb49f Nov 12, 2024

pytorchmergebot added Merged and removed merging labels Nov 12, 2024

atalman added a commit to atalman/pytorch that referenced this pull request Nov 12, 2024

Fix lint after pytorch#138899

8117057

atalman mentioned this pull request Nov 12, 2024

Fix lint after #138899 #140446

Closed

malfet mentioned this pull request Nov 17, 2024

Nightly builds missing from PyTorch cu121 repository since November 12, 2024 #140885

Closed

q10 mentioned this pull request Nov 21, 2024

[fbgemm_gpu] OSS build updates pytorch/FBGEMM#3398

Closed

malfet mentioned this pull request Nov 22, 2024

[RFC] Cuda support matrix for Release 2.6 #138609

Closed

atalman mentioned this pull request Dec 12, 2024

Enable CUDA 12.6 CI/CD , Disable CUDA 12.1 #138440

Closed

31 tasks

Conversation

tinglvv commented Oct 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138899

❌ 1 New Failure

Uh oh!

malfet left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tinglvv commented Oct 25, 2024

Uh oh!

tinglvv commented Oct 29, 2024

Uh oh!

Skylion007 Nov 3, 2024

Choose a reason for hiding this comment

Uh oh!

ptrblck Nov 9, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

atalman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tinglvv commented Nov 8, 2024

Uh oh!

tinglvv commented Nov 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tinglvv commented Nov 8, 2024

Uh oh!

pytorchmergebot commented Nov 8, 2024

Uh oh!

pytorchmergebot commented Nov 8, 2024

Uh oh!

nWEIdia Nov 12, 2024

Choose a reason for hiding this comment

Uh oh!

nWEIdia Nov 12, 2024

Choose a reason for hiding this comment

Uh oh!

atalman commented Nov 12, 2024

Uh oh!

pytorchmergebot commented Nov 12, 2024

Merge started

Uh oh!

johnnynunez commented Dec 3, 2024

Uh oh!

tinglvv commented Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

tinglvv commented Oct 25, 2024 •

edited

Loading

pytorch-bot bot commented Oct 25, 2024 •

edited

Loading

malfet left a comment •

edited

Loading

tinglvv commented Nov 8, 2024 •

edited

Loading

tinglvv commented Dec 3, 2024 •

edited

Loading