Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149778
Note: Links to docs will display an error until the docs builds have been completed. ⏳ 32 Pending, 1 Unrelated FailureAs of commit e40be77 with merge base db9b031 ( UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| set -ex | ||
|
|
||
| NCCL_VERSION=v2.25.1-1 | ||
| NCCL_VERSION=v2.26.2-1 |
There was a problem hiding this comment.
@Skylion007 aarch64 uses same script: https://github.com/pytorch/pytorch/actions/runs/14011332351/job/39231501156#step:6:1018
So we should be good
There was a problem hiding this comment.
Oh, is this script redundant then? https://github.com/pytorch/pytorch/blob/db9b031b0097d68c14f32d9c58609b8b89930426/.ci/docker/common/install_cuda_aarch64.sh
There was a problem hiding this comment.
Yes, you are correct. This is still used. We will be consolidating these scripts: #149554
|
Looks good once the corresponding arm64 script is also updated |
| maybe_libnccl_dev="libnccl2=2.15.5-1+cuda11.8 libnccl-dev=2.15.5-1+cuda11.8 --allow-downgrades --allow-change-held-packages" | ||
| elif [[ "$UBUNTU_VERSION" == "20.04"* && "$CUDA_VERSION" == "12.4"* ]]; then | ||
| maybe_libnccl_dev="libnccl2=2.25.1-1+cuda12.4 libnccl-dev=2.25.1-1+cuda12.4 --allow-downgrades --allow-change-held-packages" | ||
| maybe_libnccl_dev="libnccl2=2.26.2-1+cuda12.4 libnccl-dev=2.26.2-1+cuda12.4 --allow-downgrades --allow-change-held-packages" |
There was a problem hiding this comment.
These libs come from: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/
Validated for cuda 12.4 we do have these binaries.
There was a problem hiding this comment.
I believe we can remove this if case as a followup since we don't have CUDA 12.4 in our CI anymore
|
@pytorchmergebot merge -f "lint and docker builds are green" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot cherry-pick --onto release/2.7 --fixes "nccl update" -c critical |
Related to #149153 This updates some build scripts to hopefully fix the nightly builds which are somehow building against nccl 2.25.1 and using 2.26.2 from pip. Test plan: After merging rerun nightly linux jobs and validate that nccl version matches Pull Request resolved: #149778 Approved by: https://github.com/Skylion007, https://github.com/atalman Co-authored-by: Andrey Talman <atalman@fb.com> (cherry picked from commit ddc0fe9)
Cherry picking #149778The cherry pick PR is at #149874 and it is linked with issue nccl update. The following tracker issues are updated: Details for Dev Infra teamRaised by workflow job |
Related to pytorch#149153 This updates some build scripts to hopefully fix the nightly builds which are somehow building against nccl 2.25.1 and using 2.26.2 from pip. Test plan: After merging rerun nightly linux jobs and validate that nccl version matches Pull Request resolved: pytorch#149778 Approved by: https://github.com/Skylion007, https://github.com/atalman Co-authored-by: Andrey Talman <atalman@fb.com> (cherry picked from commit ddc0fe9)
ci/docker: use NCCL 2.26.2-1 (#149778) Related to #149153 This updates some build scripts to hopefully fix the nightly builds which are somehow building against nccl 2.25.1 and using 2.26.2 from pip. Test plan: After merging rerun nightly linux jobs and validate that nccl version matches Pull Request resolved: #149778 Approved by: https://github.com/Skylion007, https://github.com/atalman Co-authored-by: Andrey Talman <atalman@fb.com> (cherry picked from commit ddc0fe9) Co-authored-by: Tristan Rice <rice@fn.lc>
Related to pytorch#149153 This updates some build scripts to hopefully fix the nightly builds which are somehow building against nccl 2.25.1 and using 2.26.2 from pip. Test plan: After merging rerun nightly linux jobs and validate that nccl version matches Pull Request resolved: pytorch#149778 Approved by: https://github.com/Skylion007, https://github.com/atalman Co-authored-by: Andrey Talman <atalman@fb.com>
Related to #149153
This updates some build scripts to hopefully fix the nightly builds which are somehow building against nccl 2.25.1 and using 2.26.2 from pip.
Test plan:
After merging rerun nightly linux jobs and validate that nccl version matches