[SymmMem] Add multimem support for NCCL and NVSHMEM#172185
[SymmMem] Add multimem support for NCCL and NVSHMEM#172185kwen2501 wants to merge 7 commits intogh/kwen2501/306/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/172185
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 61 Pending, 1 Unrelated FailureAs of commit 657e3d7 with merge base 8cfe6f1 ( NEW FAILURE - The following job has failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
|
@pytorchbot merge |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot revert -m 'Sorry for reverting the change but I think it is failing vLLM benchmark job' -c nosignal https://github.com/pytorch/pytorch/actions/runs/20995910399/job/60431079759#step:15:15768 with the error That job supposes to use NCCL 2.28.9 https://github.com/pytorch/pytorch/actions/runs/20995910399/job/60431079519#step:7:1134 coming from our pin https://github.com/pytorch/pytorch/blob/main/.ci/docker/ci_commit_pins/nccl.txt Let me know if this is an infra issue that we need to fix to unblock this change because I only see it failing on DGX B200 cc @zou3519 |
|
@pytorchbot successfully started a revert job. Check the current status here. |
|
Let me also check why https://github.com/pytorch/pytorch/actions/runs/20995910399/job/60431079759#step:15:15768 is reported as a success while it clearly should fail |
|
@kwen2501 your PR has been successfully reverted. |
This reverts commit ed935ff. Reverted #172185 on behalf of https://github.com/huydhn due to Sorry for reverting the change but I think it is failing vLLM benchmark job ([comment](#172185 (comment)))
…172185)" This reverts commit 1c83214. Reverted pytorch#172185 on behalf of https://github.com/Skylion007 due to breaking CI builds with new nvshmem. See pytorch#172348 ([comment](pytorch#172185 (comment)))
Pull Request resolved: pytorch#172185 Approved by: https://github.com/Skylion007, https://github.com/dzmitry-huba ghstack dependencies: pytorch#172163
…172185)" This reverts commit ed935ff. Reverted pytorch#172185 on behalf of https://github.com/huydhn due to Sorry for reverting the change but I think it is failing vLLM benchmark job ([comment](pytorch#172185 (comment)))
|
Hi @huydhn sorry, the break is caused by the PR dropping support of some APIs. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (default, 3, 6, linux.rocm.gpu.gfx942.1) Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge -f "Failed Rocm test/inductor/test_cuda_repro.py has been identified as flaky" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Cherry-picked from upstream main: - [SymmMem] Back symm_mem.empty() with implicit pool (pytorch#172292) Automatic memory reuse for symmetric memory allocations - [SymmMem] Add multimem support for NCCL and NVSHMEM (pytorch#172185) Enhanced multi-GPU memory support - [inductor] Basic Comm Buffer Reuse for Symmetric Memory (pytorch#171909) Memory optimization for torch.compile with symmetric buffers - [BE] Don't print 12 `triton not found` on import (pytorch#172614) QoL fix for flop_counter imports - [inductor] Use custom triton kernel subclass when available (pytorch#167456) Enables custom backend heuristics for Triton kernels Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Stack from ghstack (oldest at bottom):