[BE]: Update CU128 cudnn to 9.8.0.87 by Skylion007 · Pull Request #148963 · pytorch/pytorch

Skylion007 · 2025-03-11T14:44:02Z

Also cu12.6 is an on old CUDNN version, we may want to upgrade it for all the performance reasons as I don't see a manywheel linux reason to stay back on the old 9.5 release. I might split that into it's own PR. This one just updates CU126 to the latest and greatest.

pytorch-bot · 2025-03-11T14:44:07Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/148963

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 1 Unrelated Failure

As of commit 8b23833 with merge base f1787ee ():

NEW FAILURES - The following jobs have failed:

macos-arm64-binary-wheel / wheel-py3_10-cpu-build (gh)
ModuleNotFoundError: No module named 'packaging'
macos-arm64-binary-wheel / wheel-py3_11-cpu-build (gh)
ModuleNotFoundError: No module named 'packaging'
macos-arm64-binary-wheel / wheel-py3_12-cpu-build (gh)
ModuleNotFoundError: No module named 'packaging'
macos-arm64-binary-wheel / wheel-py3_13-cpu-build (gh)
ModuleNotFoundError: No module named 'packaging'
macos-arm64-binary-wheel / wheel-py3_9-cpu-build (gh)
ModuleNotFoundError: No module named 'packaging'

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-focal-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge) (gh) (trunk failure)
ModuleNotFoundError: No module named 'torch.version'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Skylion007 · 2025-03-11T14:47:10Z

@tinglvv Opened the most recent PR for updating CUDNN for 12.8, any reason we didn't also update for 12.6? We had a version split previously due to ABI compatibility due to the manylinux upgrade, by that shouldn't be an issue anymore.

Skylion007 · 2025-03-11T14:57:04Z

.ci/docker/common/install_cudnn.sh

This should probably merged with 12.8 too, no reason to keep 12.6 on an old CUDNN version when there a lot of performance fixes that apply to Hopper in newer releases too now

Skylion007 · 2025-03-11T15:51:15Z

@jansel Should we update CU126's libraries in this PR or another one?

eqy · 2025-03-11T16:42:07Z

I would consider a separate PR, background is that 9.7+ is for Blackwell.
In the past we have not be super active in bumping cuDNN versions for older CUDA toolkit versions.

nWEIdia · 2025-03-11T16:44:19Z

.github/scripts/generate_binary_build_matrix.py

        "nvidia-cuda-runtime-cu12==12.8.57; platform_system == 'Linux' and platform_machine == 'x86_64' | "
        "nvidia-cuda-cupti-cu12==12.8.57; platform_system == 'Linux' and platform_machine == 'x86_64' | "
-        "nvidia-cudnn-cu12==9.7.1.26; platform_system == 'Linux' and platform_machine == 'x86_64' | "
+        "nvidia-cudnn-cu12==9.8.0.87; platform_system == 'Linux' and platform_machine == 'x86_64' | "


Changing this may need a synchronization point where @atalman usually helps us with uploading 9.8.0.87 nvidia-cudnn-cu12 first? Or this has already been done?

https://pypi.org/project/nvidia-cudnn-cu12/ looks updated with 9.8.0.87, so I think we are good on that front.

We need to upload it to our s3 bucket unfortunately.

@tinglvv for security reasons, all dependencies of torch need to live on https://download.pytorch.org/

thanks for the explanation! yes then indeed we need 9.8.0.87 in https://download.pytorch.org/whl/nightly/nvidia-cudnn-cu12/

nWEIdia

LGTM. Just had a question on uploading pypi cudnn wheel to AWS S3.

tinglvv · 2025-03-11T17:15:18Z

LGTM, if the ciflow/binaries pass then we are good to merge.

jansel · 2025-03-11T20:17:40Z

@jansel Should we update CU126's libraries in this PR or another one?

Smaller PRs would be easier.

Skylion007 · 2025-03-13T13:57:26Z

Thanks for uploading the binaries @atalman but it seems like the S3 bucket is returning a 403 error on the wheels.

Skylion007 · 2025-03-13T14:23:08Z

@pytorchbot merge -i

atalman

lgtm. Thank you @Skylion007

pytorchmergebot · 2025-03-13T14:24:52Z

Merge started

Your change will be merged while ignoring the following 6 checks: pull / linux-focal-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge), macos-arm64-binary-wheel / wheel-py3_10-cpu-build, macos-arm64-binary-wheel / wheel-py3_11-cpu-build, macos-arm64-binary-wheel / wheel-py3_13-cpu-build, macos-arm64-binary-wheel / wheel-py3_12-cpu-build, macos-arm64-binary-wheel / wheel-py3_9-cpu-build

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Skylion007 requested review from eqy, jansel, malfet and nWEIdia March 11, 2025 14:44

pytorch-bot bot added the topic: not user facing topic category label Mar 11, 2025

pytorchbot added the open source label Mar 11, 2025

Skylion007 requested a review from tinglvv March 11, 2025 14:46

Skylion007 commented Mar 11, 2025

View reviewed changes

Skylion007 added the better-engineering Relatively self-contained tasks for better engineering contributors label Mar 11, 2025

[BE]: Update CU128 cudnn to 9.8.0.87

8b23833

Skylion007 force-pushed the skylion007/update-cudnn-9-8-0-87 branch from 632751a to 8b23833 Compare March 11, 2025 15:10

Skylion007 marked this pull request as ready for review March 11, 2025 15:45

Skylion007 requested review from a team and jeffdaily as code owners March 11, 2025 15:45

jansel approved these changes Mar 11, 2025

View reviewed changes

eqy approved these changes Mar 11, 2025

View reviewed changes

nWEIdia reviewed Mar 11, 2025

View reviewed changes

nWEIdia approved these changes Mar 11, 2025

View reviewed changes

tinglvv added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Mar 11, 2025

tinglvv approved these changes Mar 11, 2025

View reviewed changes

Skylion007 requested a review from atalman March 11, 2025 17:20

atalman approved these changes Mar 13, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 13, 2025

pytorchmergebot added the merging label Mar 13, 2025

pytorchmergebot closed this in 6856d81 Mar 13, 2025

pytorchmergebot added Merged and removed merging labels Mar 13, 2025

tinglvv mentioned this pull request Mar 13, 2025

Enable CUDA 12.8.0, Disable CUDA 12.4 #145570

Closed

26 tasks

tinglvv mentioned this pull request Apr 24, 2025

Update CUDA_UPGRADE_GUIDE.MD pytorch/builder#2060

Closed

Conversation

Skylion007 commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/148963

❌ 5 New Failures, 1 Unrelated Failure

Uh oh!

Skylion007 commented Mar 11, 2025

Uh oh!

Skylion007 Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

Skylion007 commented Mar 11, 2025

Uh oh!

eqy commented Mar 11, 2025

Uh oh!

nWEIdia Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

tinglvv Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

Skylion007 Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

nWEIdia Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

tinglvv Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

nWEIdia left a comment

Choose a reason for hiding this comment

Uh oh!

tinglvv commented Mar 11, 2025

Uh oh!

jansel commented Mar 11, 2025

Uh oh!

Skylion007 commented Mar 13, 2025

Uh oh!

Skylion007 commented Mar 13, 2025

Uh oh!

atalman left a comment

Choose a reason for hiding this comment

Uh oh!

pytorchmergebot commented Mar 13, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Skylion007 commented Mar 11, 2025 •

edited

Loading

pytorch-bot bot commented Mar 11, 2025 •

edited

Loading