[SymmMem] Back symm_mem.emtpy() with implicit pool by kwen2501 · Pull Request #172292 · pytorch/pytorch

kwen2501 · 2026-01-13T00:50:11Z

Stack from ghstack (oldest at bottom):

-> [SymmMem] Back symm_mem.emtpy() with implicit pool #172292

Resolves #172050

Two motivations:

Give better UX and perf to users who explicitly use symm_mem.empty().
Simplify the code generated by Inductor, i.e. symm_mem.empty() would automatically reuse memory, rather than requiring Inductor to bookkeep it.

The MemPool infra for all CUDA backends (CUDA, NVSHMEM, NCCL) has been built previously.

[ghstack-poisoned]

pytorch-bot · 2026-01-13T00:50:15Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/172292

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Unrelated Failure

As of commit fe394be with merge base 8cfe6f1 ():

NEW FAILURES - The following jobs have failed:

Limited CI for symmetric memory tests on H100 / linux-jammy-cuda12.8-py3.10-gcc11-sm90-symm / test (h100-symm-mem, 1, 1, linux.aws.h100.4) (gh)
Process completed with exit code 1.
trunk / linux-jammy-rocm-py3.10 / test (default, 1, 6, linux.rocm.gpu.gfx942.1) (gh)
test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv_with_as_strided_dynamic_shapes_cuda

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / linux-jammy-rocm-py3.10 / test (distributed, 3, 3, linux.rocm.gpu.gfx942.4) (gh) (similar failure)
test/distributed/_tools/test_mem_tracker.py::TestMemTracker::test_tracker_with_activation_checkpointing

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 16b26b6 Pull-Request: #172292

[ghstack-poisoned]

ghstack-source-id: 015dbff Pull-Request: #172292

eqy

Does this need to use any special custom allocator or is just having another pool OK?

kwen2501 · 2026-01-13T05:37:03Z

@eqy The get_mem_pool API points to symm_mem's internal pool:

pytorch/torch/distributed/_symmetric_memory/__init__.py

Lines 2067 to 2070 in b1f19d8

    
           def get_mem_pool(device: _device) -> torch.cuda.MemPool: 
        
               """ 
        
               Get the symmetric memory pool for a given device. If not found, create a new 
        
               pool.

The pool is backed by symm_mem's allocator, be it CUDA, NVSHMEM or NCCL, depending on user's symm_mem.set_backend(...) setting.

ngimel

Thank yoU!

kwen2501 · 2026-01-13T21:39:43Z

@pytorchbot merge

pytorchmergebot · 2026-01-13T21:41:44Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

yangw-dev · 2026-01-14T17:08:53Z

@pytorchbot revert -m "sorry but it seems your pr failed internal test. error: torch._inductor.exc.InductorError: ImportError: undefined symbol: cuCtxGetCurrent, please reach out intenral staff for further debugging" -c ghfirst

pytorchmergebot · 2026-01-14T17:10:33Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

This reverts commit 4301818. Reverted #172292 on behalf of https://github.com/yangw-dev due to sorry but it seems your pr failed internal test. error: torch._inductor.exc.InductorError: ImportError: undefined symbol: cuCtxGetCurrent, please reach out intenral staff for further debugging ([comment](#172292 (comment)))

pytorchmergebot · 2026-01-14T17:10:42Z

@kwen2501 your PR has been successfully reverted.

ngimel · 2026-01-14T19:40:57Z

@kwen2501 the error is actually in dynamo tests, e.g. test/dynamo:test_aot_autograd_cache - test_autograd_function (caffe2.test.dynamo.test_aot_autograd_cache.AOTAutogradCacheBundledTests), so should be reproducible. Don't know tbh why is it triggered by this PR

  File "/re_cwd/buck-out/v2/gen/fbcode/3753cce6484443c5/caffe2/test/dynamo/__test_aot_autograd_cache__/test_aot_autograd_cache#link-tree/triton/runtime/driver.py", line 10, in _create_driver
    return active_drivers[0]()
  File "/re_cwd/buck-out/v2/gen/fbcode/3753cce6484443c5/caffe2/test/dynamo/__test_aot_autograd_cache__/test_aot_autograd_cache#link-tree/triton/backends/nvidia/driver.py", line 783, in __init__
    self.utils = CudaUtils()  # TODO: make static
  File "/re_cwd/buck-out/v2/gen/fbcode/3753cce6484443c5/caffe2/test/dynamo/__test_aot_autograd_cache__/test_aot_autograd_cache#link-tree/triton/backends/nvidia/driver.py", line 63, in __init__
    mod = compile_module_from_src(
  File "/re_cwd/buck-out/v2/gen/fbcode/3753cce6484443c5/caffe2/test/dynamo/__test_aot_autograd_cache__/test_aot_autograd_cache#link-tree/triton/runtime/build.py", line 93, in compile_module_from_src
    return _load_module_from_path(name, cache_path)
  File "/re_cwd/buck-out/v2/gen/fbcode/3753cce6484443c5/caffe2/test/dynamo/__test_aot_autograd_cache__/test_aot_autograd_cache#link-tree/triton/runtime/build.py", line 65, in _load_module_from_path
    mod = importlib.util.module_from_spec(spec)
  File "<frozen importlib._bootstrap>", line 730, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1176, in create_module
  File "<frozen importlib._bootstrap>", line 400, in _call_with_frames_removed
torch._inductor.exc.InductorError: ImportError: /re_tmp/tmpupz8kvm_/triton/YG7MWGMAX4UH3KIF6MQ665T4HJZLYF2T43HWBZOGN6O66MLEQHKA/cuda_utils.cpython-310-fb-010-x86_64.so: undefined symbol: cuCtxGetCurrent

Resolves pytorch#172050 Two motivations: - Give better UX and perf to users who explicitly use `symm_mem.empty()`. - Simplify the code generated by Inductor, i.e. `symm_mem.empty()` would automatically reuse memory, rather than requiring Inductor to bookkeep it. The MemPool infra for all CUDA backends (`CUDA`, `NVSHMEM`, `NCCL`) has been built previously. Pull Request resolved: pytorch#172292 Approved by: https://github.com/ngimel, https://github.com/dzmitry-huba ghstack dependencies: pytorch#172163

…72292)" This reverts commit 4301818. Reverted pytorch#172292 on behalf of https://github.com/yangw-dev due to sorry but it seems your pr failed internal test. error: torch._inductor.exc.InductorError: ImportError: undefined symbol: cuCtxGetCurrent, please reach out intenral staff for further debugging ([comment](pytorch#172292 (comment)))

kwen2501 · 2026-01-16T19:33:27Z

Hi @yangw-dev thanks for the heads-up!
I ran the same test locally and it passed for me.
Looking at the error message, it seems to be a Triton issue:

/triton/runtime/driver.py
/re_tmp/tmpupz8kvm_/triton/.../cuda_utils.cpython-310-fb-010-x86_64.so: undefined symbol: cuCtxGetCurrent

My PR did not modify anything related to Triton, nor changing CUDA driver linkage.
Could the error be caused by other internal changes or some flaky tmp folder failing to be cleaned?

kwen2501 · 2026-01-16T19:55:44Z

@pytorchbot rebase

pytorchmergebot · 2026-01-16T19:57:21Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2026-01-16T19:57:26Z

Rebase failed due to

Aborting rebase because rebasing the branch resulted in the same sha as the target branch.
This usually happens because the PR has already been merged.  Please rebase locally and push.

Raised by https://github.com/pytorch/pytorch/actions/runs/21079055965

eqy · 2026-01-16T19:59:55Z

Hmm, I hope it's not related to #171116 ...

kwen2501 · 2026-01-16T20:05:19Z

@pytorchbot merge -f "Triaged CI error; seems unrelated"

pytorchmergebot · 2026-01-16T20:06:57Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Cherry-picked from upstream main: - [SymmMem] Back symm_mem.empty() with implicit pool (pytorch#172292) Automatic memory reuse for symmetric memory allocations - [SymmMem] Add multimem support for NCCL and NVSHMEM (pytorch#172185) Enhanced multi-GPU memory support - [inductor] Basic Comm Buffer Reuse for Symmetric Memory (pytorch#171909) Memory optimization for torch.compile with symmetric buffers - [BE] Don't print 12 `triton not found` on import (pytorch#172614) QoL fix for flop_counter imports - [inductor] Use custom triton kernel subclass when available (pytorch#167456) Enables custom backend heuristics for Triton kernels Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Update

190fa08

[ghstack-poisoned]

pytorch-bot bot added the ciflow/h100-symm-mem label Jan 13, 2026

kwen2501 added a commit that referenced this pull request Jan 13, 2026

[SymmMem] Back symm_mem.emtpy() with implicit pool

aedb94e

ghstack-source-id: 16b26b6 Pull-Request: #172292

kwen2501 added the release notes: distributed (symm_mem) release note label for symmetric memory label Jan 13, 2026

kwen2501 requested review from Skylion007, dzmitry-huba, eee4017, fduwjj, fegin and ngimel January 13, 2026 00:54

pytorchbot added the open source label Jan 13, 2026

Update

fe394be

[ghstack-poisoned]

kwen2501 added a commit that referenced this pull request Jan 13, 2026

[SymmMem] Back symm_mem.emtpy() with implicit pool

62fae10

ghstack-source-id: 015dbff Pull-Request: #172292

kwen2501 requested a review from eqy January 13, 2026 01:02

eqy requested a review from galv January 13, 2026 05:09

eqy reviewed Jan 13, 2026

View reviewed changes

ngimel approved these changes Jan 13, 2026

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 13, 2026

pytorchmergebot added the merging label Jan 13, 2026

dzmitry-huba approved these changes Jan 13, 2026

View reviewed changes

pytorchmergebot added the Merged label Jan 14, 2026

pytorchmergebot closed this in 4301818 Jan 14, 2026

pytorchmergebot removed the merging label Jan 14, 2026

pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Jan 14, 2026

pytorchmergebot reopened this Jan 14, 2026

pytorchmergebot added the merging label Jan 16, 2026

pytorchmergebot closed this in 989c482 Jan 16, 2026

pytorchmergebot removed the merging label Jan 16, 2026

kwen2501 mentioned this pull request Jan 22, 2026

[inductor] Basic Comm Buffer Reuse for Symmetric Memory #171909

Closed

github-actions bot deleted the gh/kwen2501/308/head branch February 16, 2026 02:23

Conversation

kwen2501 commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/172292

❌ 2 New Failures, 1 Unrelated Failure

Uh oh!

eqy left a comment

Choose a reason for hiding this comment

Uh oh!

kwen2501 commented Jan 13, 2026

Uh oh!

ngimel left a comment

Choose a reason for hiding this comment

Uh oh!

kwen2501 commented Jan 13, 2026

Uh oh!

pytorchmergebot commented Jan 13, 2026

Merge started

Uh oh!

yangw-dev commented Jan 14, 2026

Uh oh!

pytorchmergebot commented Jan 14, 2026

Uh oh!

pytorchmergebot commented Jan 14, 2026

Uh oh!

ngimel commented Jan 14, 2026

Uh oh!

kwen2501 commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kwen2501 commented Jan 16, 2026

Uh oh!

pytorchmergebot commented Jan 16, 2026

Uh oh!

pytorchmergebot commented Jan 16, 2026

Uh oh!

eqy commented Jan 16, 2026

Uh oh!

kwen2501 commented Jan 16, 2026

Uh oh!

pytorchmergebot commented Jan 16, 2026

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

kwen2501 commented Jan 13, 2026 •

edited

Loading

pytorch-bot bot commented Jan 13, 2026 •

edited

Loading

kwen2501 commented Jan 16, 2026 •

edited

Loading