[SymmMem] Back symm_mem.emtpy() with implicit pool#172292
[SymmMem] Back symm_mem.emtpy() with implicit pool#172292kwen2501 wants to merge 2 commits intogh/kwen2501/308/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/172292
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 1 Unrelated FailureAs of commit fe394be with merge base 8cfe6f1 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
eqy
left a comment
There was a problem hiding this comment.
Does this need to use any special custom allocator or is just having another pool OK?
|
@eqy The pytorch/torch/distributed/_symmetric_memory/__init__.py Lines 2067 to 2070 in b1f19d8 The pool is backed by |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot revert -m "sorry but it seems your pr failed internal test. error: torch._inductor.exc.InductorError: ImportError: undefined symbol: cuCtxGetCurrent, please reach out intenral staff for further debugging" -c ghfirst |
|
@pytorchbot successfully started a revert job. Check the current status here. |
This reverts commit 4301818. Reverted #172292 on behalf of https://github.com/yangw-dev due to sorry but it seems your pr failed internal test. error: torch._inductor.exc.InductorError: ImportError: undefined symbol: cuCtxGetCurrent, please reach out intenral staff for further debugging ([comment](#172292 (comment)))
|
@kwen2501 your PR has been successfully reverted. |
|
@kwen2501 the error is actually in dynamo tests, e.g. test/dynamo:test_aot_autograd_cache - test_autograd_function (caffe2.test.dynamo.test_aot_autograd_cache.AOTAutogradCacheBundledTests), so should be reproducible. Don't know tbh why is it triggered by this PR |
Resolves pytorch#172050 Two motivations: - Give better UX and perf to users who explicitly use `symm_mem.empty()`. - Simplify the code generated by Inductor, i.e. `symm_mem.empty()` would automatically reuse memory, rather than requiring Inductor to bookkeep it. The MemPool infra for all CUDA backends (`CUDA`, `NVSHMEM`, `NCCL`) has been built previously. Pull Request resolved: pytorch#172292 Approved by: https://github.com/ngimel, https://github.com/dzmitry-huba ghstack dependencies: pytorch#172163
…72292)" This reverts commit 4301818. Reverted pytorch#172292 on behalf of https://github.com/yangw-dev due to sorry but it seems your pr failed internal test. error: torch._inductor.exc.InductorError: ImportError: undefined symbol: cuCtxGetCurrent, please reach out intenral staff for further debugging ([comment](pytorch#172292 (comment)))
|
Hi @yangw-dev thanks for the heads-up! My PR did not modify anything related to Triton, nor changing CUDA driver linkage. |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Rebase failed due to Raised by https://github.com/pytorch/pytorch/actions/runs/21079055965 |
|
Hmm, I hope it's not related to #171116 ... |
|
@pytorchbot merge -f "Triaged CI error; seems unrelated" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Cherry-picked from upstream main: - [SymmMem] Back symm_mem.empty() with implicit pool (pytorch#172292) Automatic memory reuse for symmetric memory allocations - [SymmMem] Add multimem support for NCCL and NVSHMEM (pytorch#172185) Enhanced multi-GPU memory support - [inductor] Basic Comm Buffer Reuse for Symmetric Memory (pytorch#171909) Memory optimization for torch.compile with symmetric buffers - [BE] Don't print 12 `triton not found` on import (pytorch#172614) QoL fix for flop_counter imports - [inductor] Use custom triton kernel subclass when available (pytorch#167456) Enables custom backend heuristics for Triton kernels Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Stack from ghstack (oldest at bottom):
Resolves #172050
Two motivations:
symm_mem.empty().symm_mem.empty()would automatically reuse memory, rather than requiring Inductor to bookkeep it.The MemPool infra for all CUDA backends (
CUDA,NVSHMEM,NCCL) has been built previously.