Make DeviceMesh opaque by angelayi · Pull Request #169867 · pytorch/pytorch

angelayi · 2025-12-08T19:38:10Z

By marking DTensorSpec as being a value-type opaque object, the dynamo graph now looks like:

def forward(self, L_x_ : torch.Tensor, L_mesh_ : torch.distributed.device_mesh.DeviceMesh):
    l_x_ = L_x_
    l_mesh_ = L_mesh_
    dt = torch.distributed.tensor._api.from_local(l_x_, l_mesh_, [torch.distributed.tensor.placement_types.Shard(dim=0)], run_check = False);  l_x_ = None
    redistribute = dt.redistribute(device_mesh = l_mesh_, placements = [torch.distributed.tensor.placement_types.Replicate()]);  dt = l_mesh_ = None
    to_local = redistribute.to_local();  redistribute = None
    add = to_local + 2;  to_local = None
    return (add,)

It takes in the DeviceMesh as an input (since it is marked as reference-type opaque object), and calls from_local directly (since it is marked as TorchInGraphFunctionVariable), and to_local/redistribute as call_methods on the tensors

The AOTAutograd graph decomposes the from_local/to_local/redistribute operations and looks like:

def forward(self, arg0_1, arg1_1):
    _to_copy = torch.ops.aten._to_copy.default(arg0_1, dtype = torch.float32, layout = torch.strided, device = device(type='cuda', index=0));  arg0_1 = None
    view = torch.ops.aten.view.default(_to_copy, [1]);  _to_copy = None
    all_gather_into_tensor = torch.ops._c10d_functional.all_gather_into_tensor.default(view, 2, '0');  view = None
    wait_tensor = torch.ops._c10d_functional.wait_tensor.default(all_gather_into_tensor);  all_gather_into_tensor = None
    view_1 = torch.ops.aten.view.default(wait_tensor, [2]);  wait_tensor = None
    add = torch.ops.aten.add.Tensor(view_1, 2);  view_1 = None
    return (add,)

Stack from ghstack (oldest at bottom):

-> Make DeviceMesh opaque #169867

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo @Lucaskabela @ezyang

Differential Revision: D94288113

[ghstack-poisoned]

pytorch-bot · 2025-12-08T19:38:14Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/169867

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 07d0720 with merge base dbf7019 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: ed82ea5 Pull Request resolved: #169867

cc ezyang EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]

ghstack-source-id: 47dc82b Pull Request resolved: #169867

cc ezyang EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]

ghstack-source-id: a291ffb Pull Request resolved: #169867

cc ezyang EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]

By marking DTensorSpec as being a value-type opaque object, the graph looks like: ``` def forward(self, b_buffer, x): _assert_tensor_metadata_default = torch.ops.aten._assert_tensor_metadata.default(x, dtype = torch.float64, device = device(type='cpu'), layout = torch.strided); _assert_tensor_metadata_default = None to = torch.ops.aten.to.dtype_layout(x, dtype = torch.float64, layout = torch.strided, device = device(type='{self.device_type}')); x = None view_as = torch.ops.aten.view_as.default(to, to); to = None dtensor___init__0 = self.dtensor___init__0 dtensor_const_func_spec0 = self.dtensor_const_func_spec0 flat_apply = torch.ops.higher_order.flat_apply(dtensor_const_func_spec0, dtensor___init__0, view_as, DTensorSpec(mesh=DeviceMesh('{self.device_type}', [0, 1]), placements=(Shard(dim=0),), tensor_meta=TensorMeta(shape=torch.Size([8, 4]), stride=(4, 1), dtype=torch.float64), shard_order=(ShardOrderEntry(tensor_dim=0, mesh_dims=(0,)),)), False); dtensor_const_func_spec0 = dtensor___init__0 = view_as = None add = torch.ops.aten.add.Tensor(b_buffer, flat_apply); b_buffer = flat_apply = None access_subclass_inner_tensor_default_4 = torch.ops.export.access_subclass_inner_tensor.default(add, '_local_tensor'); add = None view_as_1 = torch.ops.aten.view_as.default(access_subclass_inner_tensor_default_4, access_subclass_inner_tensor_default_4); access_subclass_inner_tensor_default_4 = None return (view_as_1,)""", # noqa: B950 ) ``` cc ezyang EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]

ghstack-source-id: ee9978a Pull Request resolved: #169867

ezyang · 2025-12-11T03:19:35Z

+        Returns FX-evaluable repr and required globals for Shard placement.
+        Needed for passing this type as an opaque object input to a custom op.
+        """
+        return f"torch.distributed.tensor.placement_types.Shard(dim={self.dim})", {}


OOC, how do you deal with situations where you need to trigger extra imports? Is that what the rhs is?

yes! the right hand side is expected to be a mapping of the FQN used in the repr -> the type itself, so then we will add this mapping to the globals of the FX graph or inductor code. But in the case of opaque objects in the torch namespace, since torch is already in the globals so we dont need to add additional imports

By marking DTensorSpec as being a value-type opaque object, the graph looks like: ```python def forward(self, b_buffer, x): _assert_tensor_metadata_default = torch.ops.aten._assert_tensor_metadata.default(x, dtype = torch.float64, device = device(type='cpu'), layout = torch.strided); _assert_tensor_metadata_default = None to = torch.ops.aten.to.dtype_layout(x, dtype = torch.float64, layout = torch.strided, device = device(type='cuda')); x = None view_as = torch.ops.aten.view_as.default(to, to); to = None dtensor___init__0 = self.dtensor___init__0 dtensor_const_func_spec0 = self.dtensor_const_func_spec0 flat_apply = torch.ops.higher_order.flat_apply( dtensor_const_func_spec0, dtensor___init__0, view_as, torch.distributed.tensor._dtensor_spec.DTensorSpec( mesh=torch.distributed.device_mesh.DeviceMesh('cuda', [0, 1]), placements=(torch.distributed.tensor.placement_types.Shard(dim=0),), tensor_meta=torch.distributed.tensor._dtensor_spec.TensorMeta(shape=torch.Size([8, 4]), stride=(4, 1), dtype=torch.float64), shard_order=(torch.distributed.tensor._dtensor_spec.ShardOrderEntry(tensor_dim=0, mesh_dims=(0,)),)), False); add = torch.ops.aten.add.Tensor(b_buffer, flat_apply); b_buffer = flat_apply = None access_subclass_inner_tensor_default_4 = torch.ops.export.access_subclass_inner_tensor.default(add, '_local_tensor'); add = None view_as_1 = torch.ops.aten.view_as.default(access_subclass_inner_tensor_default_4, access_subclass_inner_tensor_default_4); access_subclass_inner_tensor_default_4 = None return (view_as_1,) ``` cc ezyang EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]

ghstack-source-id: 4504bb7 Pull Request resolved: #169867

By marking DTensorSpec as being a value-type opaque object, the graph looks like: ```python def forward(self, b_buffer, x): _assert_tensor_metadata_default = torch.ops.aten._assert_tensor_metadata.default(x, dtype = torch.float64, device = device(type='cpu'), layout = torch.strided); _assert_tensor_metadata_default = None to = torch.ops.aten.to.dtype_layout(x, dtype = torch.float64, layout = torch.strided, device = device(type='cuda')); x = None view_as = torch.ops.aten.view_as.default(to, to); to = None dtensor___init__0 = self.dtensor___init__0 dtensor_const_func_spec0 = self.dtensor_const_func_spec0 flat_apply = torch.ops.higher_order.flat_apply( dtensor_const_func_spec0, dtensor___init__0, view_as, torch.distributed.tensor._dtensor_spec.DTensorSpec( mesh=torch.distributed.device_mesh.DeviceMesh('cuda', [0, 1]), placements=(torch.distributed.tensor.placement_types.Shard(dim=0),), tensor_meta=torch.distributed.tensor._dtensor_spec.TensorMeta(shape=torch.Size([8, 4]), stride=(4, 1), dtype=torch.float64), shard_order=(torch.distributed.tensor._dtensor_spec.ShardOrderEntry(tensor_dim=0, mesh_dims=(0,)),)), False); add = torch.ops.aten.add.Tensor(b_buffer, flat_apply); b_buffer = flat_apply = None access_subclass_inner_tensor_default_4 = torch.ops.export.access_subclass_inner_tensor.default(add, '_local_tensor'); add = None view_as_1 = torch.ops.aten.view_as.default(access_subclass_inner_tensor_default_4, access_subclass_inner_tensor_default_4); access_subclass_inner_tensor_default_4 = None return (view_as_1,) ``` cc ezyang EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx kadeng chauhang amjames Lucaskabela jataylo [ghstack-poisoned]

By marking DTensorSpec as being a value-type opaque object, the dynamo graph now looks like: ```python def forward(self, L_x_ : torch.Tensor, L_mesh_ : torch.distributed.device_mesh.DeviceMesh): l_x_ = L_x_ l_mesh_ = L_mesh_ dt = torch.distributed.tensor._api.from_local(l_x_, l_mesh_, [torch.distributed.tensor.placement_types.Shard(dim=0)], run_check = False); l_x_ = None redistribute = dt.redistribute(device_mesh = l_mesh_, placements = [torch.distributed.tensor.placement_types.Replicate()]); dt = l_mesh_ = None to_local = redistribute.to_local(); redistribute = None add = to_local + 2; to_local = None return (add,) ``` It takes in the DeviceMesh as an input (since it is marked as reference-type opaque object), and calls from_local directly (since it is marked as TorchInGraphFunctionVariable), and to_local/redistribute as call_methods on the tensors The AOTAutograd graph decomposes the from_local/to_local/redistribute operations and looks like: ``` def forward(self, arg0_1, arg1_1): _to_copy = torch.ops.aten._to_copy.default(arg0_1, dtype = torch.float32, layout = torch.strided, device = device(type='cuda', index=0)); arg0_1 = None view = torch.ops.aten.view.default(_to_copy, [1]); _to_copy = None all_gather_into_tensor = torch.ops._c10d_functional.all_gather_into_tensor.default(view, 2, '0'); view = None wait_tensor = torch.ops._c10d_functional.wait_tensor.default(all_gather_into_tensor); all_gather_into_tensor = None view_1 = torch.ops.aten.view.default(wait_tensor, [2]); wait_tensor = None add = torch.ops.aten.add.Tensor(view_1, 2); view_1 = None return (add,) ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo Lucaskabela ezyang [ghstack-poisoned]

ghstack-source-id: db1d360 Pull Request resolved: #169867

By marking DTensorSpec as being a value-type opaque object, the dynamo graph now looks like: ```python def forward(self, L_x_ : torch.Tensor, L_mesh_ : torch.distributed.device_mesh.DeviceMesh): l_x_ = L_x_ l_mesh_ = L_mesh_ dt = torch.distributed.tensor._api.from_local(l_x_, l_mesh_, [torch.distributed.tensor.placement_types.Shard(dim=0)], run_check = False); l_x_ = None redistribute = dt.redistribute(device_mesh = l_mesh_, placements = [torch.distributed.tensor.placement_types.Replicate()]); dt = l_mesh_ = None to_local = redistribute.to_local(); redistribute = None add = to_local + 2; to_local = None return (add,) ``` It takes in the DeviceMesh as an input (since it is marked as reference-type opaque object), and calls from_local directly (since it is marked as TorchInGraphFunctionVariable), and to_local/redistribute as call_methods on the tensors The AOTAutograd graph decomposes the from_local/to_local/redistribute operations and looks like: ``` def forward(self, arg0_1, arg1_1): _to_copy = torch.ops.aten._to_copy.default(arg0_1, dtype = torch.float32, layout = torch.strided, device = device(type='cuda', index=0)); arg0_1 = None view = torch.ops.aten.view.default(_to_copy, [1]); _to_copy = None all_gather_into_tensor = torch.ops._c10d_functional.all_gather_into_tensor.default(view, 2, '0'); view = None wait_tensor = torch.ops._c10d_functional.wait_tensor.default(all_gather_into_tensor); all_gather_into_tensor = None view_1 = torch.ops.aten.view.default(wait_tensor, [2]); wait_tensor = None add = torch.ops.aten.add.Tensor(view_1, 2); view_1 = None return (add,) ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo Lucaskabela ezyang [ghstack-poisoned]

ghstack-source-id: 2849a09 Pull Request resolved: #169867

angelayi · 2026-02-24T23:29:52Z

@angelayi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

angelayi · 2026-02-25T18:51:08Z

@pytorchbot merge

pytorchmergebot · 2026-02-25T18:53:18Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

@angelayi

This reverts commit 4416f11. Reverted #175510 on behalf of https://github.com/huydhn due to Per our discussion with @angelayi, revert this so that #169867 can be reverted, it is breaking a bunch of internal tests ([comment](#175510 (comment)))

Unsure whats the best way to fix things but this seems to work! Pull Request resolved: #175510 Approved by: https://github.com/azahed98

huydhn · 2026-03-04T20:18:39Z

@pytorchbot revert -m 'Per our discussion with @angelayi, revert this as it is breaking a bunch of internal tests' -c ghfirst

pytorchmergebot · 2026-03-04T20:21:03Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2026-03-04T20:21:08Z

Reverting PR 169867 failed

Reason: Command git -C /home/runner/work/pytorch/pytorch revert --no-edit c68fcaa704f9be38de29d063b680393c987aaa7f returned non-zero exit code 1

Auto-merging test/distributed/tensor/test_dtensor_compile.py
Auto-merging test/test_opaque_obj_v2.py
Auto-merging torch/_dynamo/debug_utils.py
CONFLICT (content): Merge conflict in torch/_dynamo/debug_utils.py
Auto-merging torch/_dynamo/functional_export.py
CONFLICT (content): Merge conflict in torch/_dynamo/functional_export.py
Auto-merging torch/_dynamo/guards.py
Auto-merging torch/_dynamo/variables/builder.py
CONFLICT (content): Merge conflict in torch/_dynamo/variables/builder.py
Auto-merging torch/_dynamo/variables/script_object.py
Auto-merging torch/_dynamo/variables/tensor.py
Auto-merging torch/_dynamo/variables/torch.py
Auto-merging torch/_inductor/constant_folding.py
Auto-merging torch/_library/fake_class_registry.py
Auto-merging torch/distributed/_functional_collectives.py
Auto-merging torch/distributed/device_mesh.py
Auto-merging torch/distributed/tensor/_api.py
Auto-merging torch/fx/node.py
error: could not revert c68fcaa704f... Make DeviceMesh opaque (#169867)
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git revert --continue".
hint: You can instead skip this commit with "git revert --skip".
hint: To abort and get back to the state before "git revert",
hint: run "git revert --abort".
hint: Disable this message with "git config set advice.mergeConflict false"

Details for Dev Infra team

Raised by workflow job

@angelayi

This reverts commit 4416f11. Reverted pytorch#175510 on behalf of https://github.com/huydhn due to Per our discussion with @angelayi, revert this so that pytorch#169867 can be reverted, it is breaking a bunch of internal tests ([comment](pytorch#175510 (comment)))

Reverts DeviceMesh tracing changes in #169867 Pull Request resolved: #176485 Approved by: https://github.com/huydhn

Reverts DeviceMesh tracing changes in pytorch#169867 Pull Request resolved: pytorch#176485 Approved by: https://github.com/huydhn

By marking DTensorSpec as being a value-type opaque object, the dynamo graph now looks like: ```python def forward(self, L_x_ : torch.Tensor, L_mesh_ : torch.distributed.device_mesh.DeviceMesh): l_x_ = L_x_ l_mesh_ = L_mesh_ dt = torch.distributed.tensor._api.from_local(l_x_, l_mesh_, [torch.distributed.tensor.placement_types.Shard(dim=0)], run_check = False); l_x_ = None redistribute = dt.redistribute(device_mesh = l_mesh_, placements = [torch.distributed.tensor.placement_types.Replicate()]); dt = l_mesh_ = None to_local = redistribute.to_local(); redistribute = None add = to_local + 2; to_local = None return (add,) ``` It takes in the DeviceMesh as an input (since it is marked as reference-type opaque object), and calls from_local directly (since it is marked as TorchInGraphFunctionVariable), and to_local/redistribute as call_methods on the tensors The AOTAutograd graph decomposes the from_local/to_local/redistribute operations and looks like: ``` def forward(self, arg0_1, arg1_1): _to_copy = torch.ops.aten._to_copy.default(arg0_1, dtype = torch.float32, layout = torch.strided, device = device(type='cuda', index=0)); arg0_1 = None view = torch.ops.aten.view.default(_to_copy, [1]); _to_copy = None all_gather_into_tensor = torch.ops._c10d_functional.all_gather_into_tensor.default(view, 2, '0'); view = None wait_tensor = torch.ops._c10d_functional.wait_tensor.default(all_gather_into_tensor); all_gather_into_tensor = None view_1 = torch.ops.aten.view.default(wait_tensor, [2]); wait_tensor = None add = torch.ops.aten.add.Tensor(view_1, 2); view_1 = None return (add,) ``` Differential Revision: [D94288113](https://our.internmc.facebook.com/intern/diff/D94288113) Pull Request resolved: pytorch#169867 Approved by: https://github.com/ezyang

@angelayi

This reverts commit 4416f11. Reverted pytorch#175510 on behalf of https://github.com/huydhn due to Per our discussion with @angelayi, revert this so that pytorch#169867 can be reverted, it is breaking a bunch of internal tests ([comment](pytorch#175510 (comment)))

Reverts DeviceMesh tracing changes in pytorch#169867 Pull Request resolved: pytorch#176485 Approved by: https://github.com/huydhn

[opaque_obj] Try out dtensor tests

46b848c

[ghstack-poisoned]

This was referenced Dec 8, 2025

[opaque obj] Add value-type opaque obj #167768

Closed

[opaque_obj] Add nested value-type opaque obj support #169845

Closed

angelayi mentioned this pull request Dec 8, 2025

[opaque_obj] Replace pytree.register_constant uses #169846

Closed

pytorch-bot Bot added ciflow/inductor release notes: fx release notes category labels Dec 8, 2025

facebook-github-bot added the fx label Dec 8, 2025

angelayi added a commit that referenced this pull request Dec 8, 2025

[opaque_obj] Try out dtensor tests

93e5755

ghstack-source-id: ed82ea5 Pull Request resolved: #169867

Update on "[opaque_obj] Try out dtensor tests"

9a434f8

cc ezyang EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]

angelayi added a commit that referenced this pull request Dec 9, 2025

[opaque_obj] Try out dtensor tests

2cea434

ghstack-source-id: 47dc82b Pull Request resolved: #169867

Update on "[opaque_obj] Try out dtensor tests"

365da51

cc ezyang EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]

angelayi added a commit that referenced this pull request Dec 9, 2025

[opaque_obj] Try out dtensor tests

1d365ce

ghstack-source-id: a291ffb Pull Request resolved: #169867

Update on "[opaque_obj] Try out dtensor tests"

f7043e5

cc ezyang EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]

angelayi mentioned this pull request Dec 10, 2025

custom-opify redistribute #170089

Closed

angelayi added a commit that referenced this pull request Dec 10, 2025

[opaque_obj] Try out dtensor tests

9010caf

ghstack-source-id: ee9978a Pull Request resolved: #169867