[SymmMem] Deprecate enable_symm_mem_for_group by kwen2501 · Pull Request #172163 · pytorch/pytorch

kwen2501 · 2026-01-10T08:18:20Z

Stack from ghstack (oldest at bottom):

-> [SymmMem] Deprecate enable_symm_mem_for_group #172163

enable_symm_mem_for_group is for getting access to the store of a group.
But the store can be also retrieved by ProcessGroup.getStore() in C++.
Thus makes little sense to require users to call enable_symm_mem_for_group.

cc @Skylion007 . Thanks for pointing out the inconvenience.

[ghstack-poisoned]

pytorch-bot · 2026-01-10T08:18:24Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/172163

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 945de0e with merge base 8cfe6f1 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: c3729ab Pull-Request: #172163

[ghstack-poisoned]

ghstack-source-id: 64fde2e Pull-Request: #172163

kwen2501 · 2026-01-10T08:28:59Z

torch/csrc/distributed/c10d/symm_mem/NCCLSymmetricMemory.cu

-    // For logging only
-    static int exchanged_n_times = 0;
-    auto global_rank = get_group_info("0").rank;
-    auto store = group_info.store;
-    // Exchange rank to global rank mapping for this group.
-    // If it is already available, skip the exchange.
-    if (group_info.rank_to_global_rank.empty()) {
-      group_info.rank_to_global_rank =
-          storeExchange.all_gather(store, rank_, world_size_, global_rank);
-      exchanged_n_times++;
-      if (rank_ == 0) {
-        LOG(INFO) << "[rank " << rank_ << ']'
-                  << " rank_to_global_rank: " << group_info.rank_to_global_rank
-                  << ", group_name: " << group_name_
-                  << ", exchanged_n_times: " << exchanged_n_times;
-      }
-    }
-
-    TORCH_INTERNAL_ASSERT(!group_info.rank_to_global_rank.empty());
-    rank_to_global_rank_ = group_info.rank_to_global_rank;
-


NCCL has no need for this mapping.

kwen2501 · 2026-01-10T08:29:05Z

torch/csrc/distributed/c10d/symm_mem/NCCLSymmetricMemory.cu

-    auto group_info = get_group_info("0");
-    auto store = group_info.store;


Removing dead code

kwen2501 · 2026-01-10T08:30:35Z

torch/csrc/distributed/c10d/symm_mem/NVSHMEMSymmetricMemory.cu

-
-    rank_to_global_rank_dev_ = reinterpret_cast<int*>(
-        c10::cuda::CUDACachingAllocator::raw_alloc(sizeof(int) * world_size_));
-    AT_CUDA_CHECK(cudaMemcpy(
-        rank_to_global_rank_dev_,
-        rank_to_global_rank_.data(),
-        sizeof(int) * world_size_,
-        cudaMemcpyHostToDevice));


This device-side mapping doesn't need to be repeatedly allocated per-handle. I made it per-group above.

kwen2501 · 2026-01-10T08:31:43Z

torch/csrc/distributed/c10d/symm_mem/SymmetricMemory.hpp

-  // Note this field is not automatically populated by set_group_info().  If a
-  // SymmetricMemory implementation needs to use it, it must be populated by a
-  // call to exchange_global_ranks() first.
-  std::vector<int> rank_to_global_rank;


This is only needed by NVSHMEM. I made it an internal impl.

Skylion007

Definitely better than before

Skylion007 · 2026-01-10T17:53:44Z

torch/csrc/distributed/c10d/symm_mem/NVSHMEMSymmetricMemory.cu

 };

+// A map from group name to rank-to-global rank mapping
+static std::unordered_map<std::string, std::vector<int>> rank_to_global_rank_map{};


Any reason this map needs to be reference outside the allocator instance?

It is referenced by two classes now:
NVSHMEMPeerAllocInfo and NVSHMEMSymmetricMemory

Bad code smell right there for a global static map

Skylion007 · 2026-01-10T17:55:24Z

torch/distributed/_symmetric_memory/__init__.py



+@deprecated(
+    "`enable_symm_mem_for_group` is deprecated. There is no need to call this function anymore."


Specify warning type in deprecated decorator as FutureWarning since it will likely be removed soon

Done, thanks

[ghstack-poisoned]

ghstack-source-id: bfb3819 Pull-Request: #172163

kwen2501 · 2026-01-10T20:51:03Z

@pytorchbot merge

pytorchmergebot · 2026-01-10T20:54:00Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Resolves pytorch#171827 `enable_symm_mem_for_group` is for getting access to the store of a group. But the store can be also retrieved by `ProcessGroup.getStore()` in C++. Thus makes little sense to require users to call `enable_symm_mem_for_group`. Pull Request resolved: pytorch#172163 Approved by: https://github.com/Skylion007

Pull Request resolved: #172185 Approved by: https://github.com/Skylion007, https://github.com/dzmitry-huba ghstack dependencies: #172163

Resolves #172050 Two motivations: - Give better UX and perf to users who explicitly use `symm_mem.empty()`. - Simplify the code generated by Inductor, i.e. `symm_mem.empty()` would automatically reuse memory, rather than requiring Inductor to bookkeep it. The MemPool infra for all CUDA backends (`CUDA`, `NVSHMEM`, `NCCL`) has been built previously. Pull Request resolved: #172292 Approved by: https://github.com/ngimel, https://github.com/dzmitry-huba ghstack dependencies: #172163

Pull Request resolved: #172185 Approved by: https://github.com/Skylion007, https://github.com/dzmitry-huba ghstack dependencies: #172163

Fixes #172398 `NCCL_DEV_COMM_REQUIREMENTS_INITIALIZER` available in NCCL 2.29. Pull Request resolved: #172400 Approved by: https://github.com/dzmitry-huba, https://github.com/fduwjj ghstack dependencies: #172163

Resolves pytorch#172050 Two motivations: - Give better UX and perf to users who explicitly use `symm_mem.empty()`. - Simplify the code generated by Inductor, i.e. `symm_mem.empty()` would automatically reuse memory, rather than requiring Inductor to bookkeep it. The MemPool infra for all CUDA backends (`CUDA`, `NVSHMEM`, `NCCL`) has been built previously. Pull Request resolved: pytorch#172292 Approved by: https://github.com/ngimel, https://github.com/dzmitry-huba ghstack dependencies: pytorch#172163

Pull Request resolved: pytorch#172185 Approved by: https://github.com/Skylion007, https://github.com/dzmitry-huba ghstack dependencies: pytorch#172163

Fixes pytorch#172398 `NCCL_DEV_COMM_REQUIREMENTS_INITIALIZER` available in NCCL 2.29. Pull Request resolved: pytorch#172400 Approved by: https://github.com/dzmitry-huba, https://github.com/fduwjj ghstack dependencies: pytorch#172163

Resolves #172050 Two motivations: - Give better UX and perf to users who explicitly use `symm_mem.empty()`. - Simplify the code generated by Inductor, i.e. `symm_mem.empty()` would automatically reuse memory, rather than requiring Inductor to bookkeep it. The MemPool infra for all CUDA backends (`CUDA`, `NVSHMEM`, `NCCL`) has been built previously. Pull Request resolved: #172292 Approved by: https://github.com/ngimel, https://github.com/dzmitry-huba ghstack dependencies: #172163

Pull Request resolved: #172185 Approved by: https://github.com/Skylion007, https://github.com/dzmitry-huba ghstack dependencies: #172163

Fixes pytorch#172398 `NCCL_DEV_COMM_REQUIREMENTS_INITIALIZER` available in NCCL 2.29. Pull Request resolved: pytorch#172400 Approved by: https://github.com/dzmitry-huba, https://github.com/fduwjj ghstack dependencies: pytorch#172163

Update

cb3ad55

[ghstack-poisoned]

pytorch-bot bot added ciflow/h100-symm-mem release notes: distributed (c10d) release notes category labels Jan 10, 2026

kwen2501 added a commit that referenced this pull request Jan 10, 2026

[SymmMem] Deprecate enable_symm_mem_for_group

83ea4ba

ghstack-source-id: c3729ab Pull-Request: #172163

Update

c2df82e

[ghstack-poisoned]

kwen2501 added a commit that referenced this pull request Jan 10, 2026

[SymmMem] Deprecate enable_symm_mem_for_group

8d83f46

ghstack-source-id: 64fde2e Pull-Request: #172163

pytorchbot added the open source label Jan 10, 2026

kwen2501 commented Jan 10, 2026

View reviewed changes

kwen2501 requested review from Skylion007, dzmitry-huba, fduwjj, fegin and ngimel January 10, 2026 08:32

kwen2501 added the release notes: distributed (symm_mem) release note label for symmetric memory label Jan 10, 2026

Skylion007 approved these changes Jan 10, 2026

View reviewed changes

Update

945de0e

[ghstack-poisoned]

kwen2501 added a commit that referenced this pull request Jan 10, 2026

[SymmMem] Deprecate enable_symm_mem_for_group

93aee51

ghstack-source-id: bfb3819 Pull-Request: #172163

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 10, 2026

pytorchmergebot added the merging label Jan 10, 2026

pytorchmergebot added the Merged label Jan 11, 2026

pytorchmergebot closed this in 31fadcb Jan 11, 2026

pytorchmergebot removed the merging label Jan 11, 2026

pytorchmergebot pushed a commit that referenced this pull request Jan 12, 2026

[SymmMem] Add multimem support for NCCL and NVSHMEM (#172185)

1c83214

Pull Request resolved: #172185 Approved by: https://github.com/Skylion007, https://github.com/dzmitry-huba ghstack dependencies: #172163

pytorchmergebot pushed a commit that referenced this pull request Jan 14, 2026

[SymmMem] Add multimem support for NCCL and NVSHMEM (#172185)

ed935ff

Pull Request resolved: #172185 Approved by: https://github.com/Skylion007, https://github.com/dzmitry-huba ghstack dependencies: #172163

pytorchmergebot pushed a commit that referenced this pull request Jan 16, 2026

[SymmMem] Add multimem support for NCCL and NVSHMEM (#172185)

89a5443

Pull Request resolved: #172185 Approved by: https://github.com/Skylion007, https://github.com/dzmitry-huba ghstack dependencies: #172163

github-actions bot deleted the gh/kwen2501/305/head branch February 10, 2026 02:24

eqy mentioned this pull request Mar 17, 2026

[NCCL][Symmetric Memory] Use get_group_info instead of resolve_process_group in make_peer_alloc_info #177700

Draft

		auto group_info = get_group_info("0");
		auto store = group_info.store;



		@deprecated(
		"`enable_symm_mem_for_group` is deprecated. There is no need to call this function anymore."

Conversation

kwen2501 commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/172163

✅ No Failures

Uh oh!

kwen2501 Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

kwen2501 Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

kwen2501 Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kwen2501 Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Skylion007 left a comment

Choose a reason for hiding this comment

Uh oh!

Skylion007 Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

kwen2501 Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

Skylion007 Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

Skylion007 Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

kwen2501 Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

kwen2501 commented Jan 10, 2026

Uh oh!

pytorchmergebot commented Jan 10, 2026

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kwen2501 commented Jan 10, 2026 •

edited

Loading

pytorch-bot bot commented Jan 10, 2026 •

edited

Loading

kwen2501 Jan 10, 2026 •

edited

Loading

kwen2501 Jan 10, 2026 •

edited

Loading