[BE]: Update CU128 cudnn to 9.8.0.87#148963
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/148963
Note: Links to docs will display an error until the docs builds have been completed. ❌ 5 New Failures, 1 Unrelated FailureAs of commit 8b23833 with merge base f1787ee ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@tinglvv Opened the most recent PR for updating CUDNN for 12.8, any reason we didn't also update for 12.6? We had a version split previously due to ABI compatibility due to the manylinux upgrade, by that shouldn't be an issue anymore. |
.ci/docker/common/install_cudnn.sh
Outdated
There was a problem hiding this comment.
This should probably merged with 12.8 too, no reason to keep 12.6 on an old CUDNN version when there a lot of performance fixes that apply to Hopper in newer releases too now
632751a to
8b23833
Compare
|
@jansel Should we update CU126's libraries in this PR or another one? |
|
I would consider a separate PR, background is that 9.7+ is for Blackwell. |
| "nvidia-cuda-runtime-cu12==12.8.57; platform_system == 'Linux' and platform_machine == 'x86_64' | " | ||
| "nvidia-cuda-cupti-cu12==12.8.57; platform_system == 'Linux' and platform_machine == 'x86_64' | " | ||
| "nvidia-cudnn-cu12==9.7.1.26; platform_system == 'Linux' and platform_machine == 'x86_64' | " | ||
| "nvidia-cudnn-cu12==9.8.0.87; platform_system == 'Linux' and platform_machine == 'x86_64' | " |
There was a problem hiding this comment.
Changing this may need a synchronization point where @atalman usually helps us with uploading 9.8.0.87 nvidia-cudnn-cu12 first? Or this has already been done?
There was a problem hiding this comment.
https://pypi.org/project/nvidia-cudnn-cu12/ looks updated with 9.8.0.87, so I think we are good on that front.
There was a problem hiding this comment.
We need to upload it to our s3 bucket unfortunately.
There was a problem hiding this comment.
@tinglvv for security reasons, all dependencies of torch need to live on https://download.pytorch.org/
There was a problem hiding this comment.
thanks for the explanation! yes then indeed we need 9.8.0.87 in https://download.pytorch.org/whl/nightly/nvidia-cudnn-cu12/
nWEIdia
left a comment
There was a problem hiding this comment.
LGTM. Just had a question on uploading pypi cudnn wheel to AWS S3.
|
LGTM, if the ciflow/binaries pass then we are good to merge. |
Smaller PRs would be easier. |
|
Thanks for uploading the binaries @atalman but it seems like the S3 bucket is returning a 403 error on the wheels. |
|
@pytorchbot merge -i |
atalman
left a comment
There was a problem hiding this comment.
lgtm. Thank you @Skylion007
Merge startedYour change will be merged while ignoring the following 6 checks: pull / linux-focal-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge), macos-arm64-binary-wheel / wheel-py3_10-cpu-build, macos-arm64-binary-wheel / wheel-py3_11-cpu-build, macos-arm64-binary-wheel / wheel-py3_13-cpu-build, macos-arm64-binary-wheel / wheel-py3_12-cpu-build, macos-arm64-binary-wheel / wheel-py3_9-cpu-build Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Also cu12.6 is an on old CUDNN version, we may want to upgrade it for all the performance reasons as I don't see a manywheel linux reason to stay back on the old 9.5 release. I might split that into it's own PR. This one just updates CU126 to the latest and greatest.