Use same NVSHMEM version across CUDA builds#162206
Use same NVSHMEM version across CUDA builds#162206kwen2501 wants to merge 3 commits intogh/kwen2501/230/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162206
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit 194cd4e with merge base 1f0b01d ( FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@tinglvv Does this PR make sense? Can you please review? Thanks! |
|
Thanks! Looks good. |
|
@pytorchbot merge |
Merge failedReason: Approvers from one of the following sets are needed:
|
|
Perfect, I was planning on doing this anymore: pytorch/.ci/docker/common/install_cuda.sh Line 13 in b04e922 |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: linux-binary-manywheel / manywheel-py3_12-cuda12_8-test / test Details for Dev Infra teamRaised by workflow job |
|
@atalman We need some new S3 uploads for nvidia wheels |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: Command Details for Dev Infra teamRaised by workflow job |
This PR bumps the NVSHMEM version to 3.3.24 for CUDA 12 as well. Pull Request resolved: #162206 Approved by: https://github.com/tinglvv, https://github.com/Skylion007 ghstack-source-id: d5589c4
|
There is a land conflict which leads to mismatched yml generation. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable) Details for Dev Infra teamRaised by workflow job |
|
@kwen2501 New NVSHMEM just dropped on PYPI that can use IBGDA on more devices. Should we upgrade it across the board? |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@Skylion007 Thanks! |
Let's do it |
pytorch#161321 bumped NVSHMEM version to 3.3.24 for CUDA 13, leaving CUDA 12 with 3.3.20. This PR bumps the NVSHMEM version to 3.3.24 for CUDA 12 as well. Pull Request resolved: pytorch#162206 Approved by: https://github.com/tinglvv, https://github.com/Skylion007
This reverts commit 0d9c95c. Reverted pytorch#162206 on behalf of https://github.com/malfet due to Broke lint, see https://hud.pytorch.org/hud/pytorch/pytorch/4dd73e659a8fd4872e5f49cfd72e420fa7c4e6c9/1?per_page=50&name_filter=workflow-checks ([comment](pytorch#162206 (comment)))
pytorch#161321 bumped NVSHMEM version to 3.3.24 for CUDA 13, leaving CUDA 12 with 3.3.20. This PR bumps the NVSHMEM version to 3.3.24 for CUDA 12 as well. Pull Request resolved: pytorch#162206 Approved by: https://github.com/tinglvv, https://github.com/Skylion007
pytorch#161321 bumped NVSHMEM version to 3.3.24 for CUDA 13, leaving CUDA 12 with 3.3.20. This PR bumps the NVSHMEM version to 3.3.24 for CUDA 12 as well. Pull Request resolved: pytorch#162206 Approved by: https://github.com/tinglvv, https://github.com/Skylion007
This reverts commit 0d9c95c. Reverted pytorch#162206 on behalf of https://github.com/malfet due to Broke lint, see https://hud.pytorch.org/hud/pytorch/pytorch/4dd73e659a8fd4872e5f49cfd72e420fa7c4e6c9/1?per_page=50&name_filter=workflow-checks ([comment](pytorch#162206 (comment)))
pytorch#161321 bumped NVSHMEM version to 3.3.24 for CUDA 13, leaving CUDA 12 with 3.3.20. This PR bumps the NVSHMEM version to 3.3.24 for CUDA 12 as well. Pull Request resolved: pytorch#162206 Approved by: https://github.com/tinglvv, https://github.com/Skylion007
pytorch#161321 bumped NVSHMEM version to 3.3.24 for CUDA 13, leaving CUDA 12 with 3.3.20. This PR bumps the NVSHMEM version to 3.3.24 for CUDA 12 as well. Pull Request resolved: pytorch#162206 Approved by: https://github.com/tinglvv, https://github.com/Skylion007
This reverts commit 0d9c95c. Reverted pytorch#162206 on behalf of https://github.com/malfet due to Broke lint, see https://hud.pytorch.org/hud/pytorch/pytorch/4dd73e659a8fd4872e5f49cfd72e420fa7c4e6c9/1?per_page=50&name_filter=workflow-checks ([comment](pytorch#162206 (comment)))
pytorch#161321 bumped NVSHMEM version to 3.3.24 for CUDA 13, leaving CUDA 12 with 3.3.20. This PR bumps the NVSHMEM version to 3.3.24 for CUDA 12 as well. Pull Request resolved: pytorch#162206 Approved by: https://github.com/tinglvv, https://github.com/Skylion007
pytorch#161321 bumped NVSHMEM version to 3.3.24 for CUDA 13, leaving CUDA 12 with 3.3.20. This PR bumps the NVSHMEM version to 3.3.24 for CUDA 12 as well. Pull Request resolved: pytorch#162206 Approved by: https://github.com/tinglvv, https://github.com/Skylion007
This reverts commit 0d9c95c. Reverted pytorch#162206 on behalf of https://github.com/malfet due to Broke lint, see https://hud.pytorch.org/hud/pytorch/pytorch/4dd73e659a8fd4872e5f49cfd72e420fa7c4e6c9/1?per_page=50&name_filter=workflow-checks ([comment](pytorch#162206 (comment)))
pytorch#161321 bumped NVSHMEM version to 3.3.24 for CUDA 13, leaving CUDA 12 with 3.3.20. This PR bumps the NVSHMEM version to 3.3.24 for CUDA 12 as well. Pull Request resolved: pytorch#162206 Approved by: https://github.com/tinglvv, https://github.com/Skylion007
Stack from ghstack (oldest at bottom):
#161321 bumped NVSHMEM version to 3.3.24 for CUDA 13, leaving CUDA 12 with 3.3.20.
This PR bumps the NVSHMEM version to 3.3.24 for CUDA 12 as well.