Skip to content

Make DeviceMesh opaque#169867

Closed
angelayi wants to merge 27 commits intogh/angelayi/147/basefrom
gh/angelayi/147/head
Closed

Make DeviceMesh opaque#169867
angelayi wants to merge 27 commits intogh/angelayi/147/basefrom
gh/angelayi/147/head

Conversation

@angelayi
Copy link
Copy Markdown
Contributor

@angelayi angelayi commented Dec 8, 2025

By marking DTensorSpec as being a value-type opaque object, the dynamo graph now looks like:

def forward(self, L_x_ : torch.Tensor, L_mesh_ : torch.distributed.device_mesh.DeviceMesh):
    l_x_ = L_x_
    l_mesh_ = L_mesh_
    dt = torch.distributed.tensor._api.from_local(l_x_, l_mesh_, [torch.distributed.tensor.placement_types.Shard(dim=0)], run_check = False);  l_x_ = None
    redistribute = dt.redistribute(device_mesh = l_mesh_, placements = [torch.distributed.tensor.placement_types.Replicate()]);  dt = l_mesh_ = None
    to_local = redistribute.to_local();  redistribute = None
    add = to_local + 2;  to_local = None
    return (add,)

It takes in the DeviceMesh as an input (since it is marked as reference-type opaque object), and calls from_local directly (since it is marked as TorchInGraphFunctionVariable), and to_local/redistribute as call_methods on the tensors

The AOTAutograd graph decomposes the from_local/to_local/redistribute operations and looks like:

def forward(self, arg0_1, arg1_1):
    _to_copy = torch.ops.aten._to_copy.default(arg0_1, dtype = torch.float32, layout = torch.strided, device = device(type='cuda', index=0));  arg0_1 = None
    view = torch.ops.aten.view.default(_to_copy, [1]);  _to_copy = None
    all_gather_into_tensor = torch.ops._c10d_functional.all_gather_into_tensor.default(view, 2, '0');  view = None
    wait_tensor = torch.ops._c10d_functional.wait_tensor.default(all_gather_into_tensor);  all_gather_into_tensor = None
    view_1 = torch.ops.aten.view.default(wait_tensor, [2]);  wait_tensor = None
    add = torch.ops.aten.add.Tensor(view_1, 2);  view_1 = None
    return (add,)

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo @Lucaskabela @ezyang

Differential Revision: D94288113

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Dec 8, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/169867

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 07d0720 with merge base dbf7019 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot Bot added ciflow/inductor release notes: fx release notes category labels Dec 8, 2025
angelayi added a commit that referenced this pull request Dec 8, 2025
ghstack-source-id: ed82ea5
Pull Request resolved: #169867
cc ezyang EikanWang jgong5 wenzhe-nrv

[ghstack-poisoned]
angelayi added a commit that referenced this pull request Dec 9, 2025
ghstack-source-id: 47dc82b
Pull Request resolved: #169867
cc ezyang EikanWang jgong5 wenzhe-nrv

[ghstack-poisoned]
angelayi added a commit that referenced this pull request Dec 9, 2025
ghstack-source-id: a291ffb
Pull Request resolved: #169867
cc ezyang EikanWang jgong5 wenzhe-nrv

[ghstack-poisoned]
By marking DTensorSpec as being a value-type opaque object, the graph looks like:
```
def forward(self, b_buffer, x):
    _assert_tensor_metadata_default = torch.ops.aten._assert_tensor_metadata.default(x, dtype = torch.float64, device = device(type='cpu'), layout = torch.strided);  _assert_tensor_metadata_default = None
    to = torch.ops.aten.to.dtype_layout(x, dtype = torch.float64, layout = torch.strided, device = device(type='{self.device_type}'));  x = None
    view_as = torch.ops.aten.view_as.default(to, to);  to = None
    dtensor___init__0 = self.dtensor___init__0
    dtensor_const_func_spec0 = self.dtensor_const_func_spec0
    flat_apply = torch.ops.higher_order.flat_apply(dtensor_const_func_spec0, dtensor___init__0, view_as, DTensorSpec(mesh=DeviceMesh('{self.device_type}', [0, 1]), placements=(Shard(dim=0),), tensor_meta=TensorMeta(shape=torch.Size([8, 4]), stride=(4, 1), dtype=torch.float64), shard_order=(ShardOrderEntry(tensor_dim=0, mesh_dims=(0,)),)), False);  dtensor_const_func_spec0 = dtensor___init__0 = view_as = None
    add = torch.ops.aten.add.Tensor(b_buffer, flat_apply);  b_buffer = flat_apply = None
    access_subclass_inner_tensor_default_4 = torch.ops.export.access_subclass_inner_tensor.default(add, '_local_tensor');  add = None
    view_as_1 = torch.ops.aten.view_as.default(access_subclass_inner_tensor_default_4, access_subclass_inner_tensor_default_4);  access_subclass_inner_tensor_default_4 = None
    return (view_as_1,)""",  # noqa: B950
        )
```




cc ezyang EikanWang jgong5 wenzhe-nrv

[ghstack-poisoned]
angelayi added a commit that referenced this pull request Dec 10, 2025
ghstack-source-id: ee9978a
Pull Request resolved: #169867
Returns FX-evaluable repr and required globals for Shard placement.
Needed for passing this type as an opaque object input to a custom op.
"""
return f"torch.distributed.tensor.placement_types.Shard(dim={self.dim})", {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOC, how do you deal with situations where you need to trigger extra imports? Is that what the rhs is?

Copy link
Copy Markdown
Contributor Author

@angelayi angelayi Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! the right hand side is expected to be a mapping of the FQN used in the repr -> the type itself, so then we will add this mapping to the globals of the FX graph or inductor code. But in the case of opaque objects in the torch namespace, since torch is already in the globals so we dont need to add additional imports

Comment thread torch/distributed/tensor/_dtensor_spec.py Outdated
Comment thread test/distributed/tensor/test_dtensor_export.py Outdated
Comment thread test/distributed/tensor/test_dtensor_compile.py Outdated
Comment thread torch/distributed/device_mesh.py Outdated
Comment thread torch/distributed/device_mesh.py Outdated
By marking DTensorSpec as being a value-type opaque object, the graph looks like:
```python
def forward(self, b_buffer, x):
    _assert_tensor_metadata_default = torch.ops.aten._assert_tensor_metadata.default(x, dtype = torch.float64, device = device(type='cpu'), layout = torch.strided);  _assert_tensor_metadata_default = None
    to = torch.ops.aten.to.dtype_layout(x, dtype = torch.float64, layout = torch.strided, device = device(type='cuda'));  x = None
    view_as = torch.ops.aten.view_as.default(to, to);  to = None
    dtensor___init__0 = self.dtensor___init__0
    dtensor_const_func_spec0 = self.dtensor_const_func_spec0
    flat_apply = torch.ops.higher_order.flat_apply(
        dtensor_const_func_spec0, 
        dtensor___init__0, 
        view_as,
        torch.distributed.tensor._dtensor_spec.DTensorSpec(
            mesh=torch.distributed.device_mesh.DeviceMesh('cuda', [0, 1]), 
            placements=(torch.distributed.tensor.placement_types.Shard(dim=0),), 
            tensor_meta=torch.distributed.tensor._dtensor_spec.TensorMeta(shape=torch.Size([8, 4]), stride=(4, 1), dtype=torch.float64), 
            shard_order=(torch.distributed.tensor._dtensor_spec.ShardOrderEntry(tensor_dim=0, mesh_dims=(0,)),)), 
        False);
    add = torch.ops.aten.add.Tensor(b_buffer, flat_apply);  b_buffer = flat_apply = None
    access_subclass_inner_tensor_default_4 = torch.ops.export.access_subclass_inner_tensor.default(add, '_local_tensor');  add = None
    view_as_1 = torch.ops.aten.view_as.default(access_subclass_inner_tensor_default_4, access_subclass_inner_tensor_default_4);  access_subclass_inner_tensor_default_4 = None
    return (view_as_1,)
```




cc ezyang EikanWang jgong5 wenzhe-nrv

[ghstack-poisoned]
By marking DTensorSpec as being a value-type opaque object, the graph looks like:
```python
def forward(self, b_buffer, x):
    _assert_tensor_metadata_default = torch.ops.aten._assert_tensor_metadata.default(x, dtype = torch.float64, device = device(type='cpu'), layout = torch.strided);  _assert_tensor_metadata_default = None
    to = torch.ops.aten.to.dtype_layout(x, dtype = torch.float64, layout = torch.strided, device = device(type='cuda'));  x = None
    view_as = torch.ops.aten.view_as.default(to, to);  to = None
    dtensor___init__0 = self.dtensor___init__0
    dtensor_const_func_spec0 = self.dtensor_const_func_spec0
    flat_apply = torch.ops.higher_order.flat_apply(
        dtensor_const_func_spec0, 
        dtensor___init__0, 
        view_as,
        torch.distributed.tensor._dtensor_spec.DTensorSpec(
            mesh=torch.distributed.device_mesh.DeviceMesh('cuda', [0, 1]), 
            placements=(torch.distributed.tensor.placement_types.Shard(dim=0),), 
            tensor_meta=torch.distributed.tensor._dtensor_spec.TensorMeta(shape=torch.Size([8, 4]), stride=(4, 1), dtype=torch.float64), 
            shard_order=(torch.distributed.tensor._dtensor_spec.ShardOrderEntry(tensor_dim=0, mesh_dims=(0,)),)), 
        False);
    add = torch.ops.aten.add.Tensor(b_buffer, flat_apply);  b_buffer = flat_apply = None
    access_subclass_inner_tensor_default_4 = torch.ops.export.access_subclass_inner_tensor.default(add, '_local_tensor');  add = None
    view_as_1 = torch.ops.aten.view_as.default(access_subclass_inner_tensor_default_4, access_subclass_inner_tensor_default_4);  access_subclass_inner_tensor_default_4 = None
    return (view_as_1,)
```




cc ezyang EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx kadeng chauhang amjames Lucaskabela jataylo

[ghstack-poisoned]
By marking DTensorSpec as being a value-type opaque object, the dynamo graph now looks like:
```python
def forward(self, L_x_ : torch.Tensor, L_mesh_ : torch.distributed.device_mesh.DeviceMesh):
    l_x_ = L_x_
    l_mesh_ = L_mesh_
    dt = torch.distributed.tensor._api.from_local(l_x_, l_mesh_, [torch.distributed.tensor.placement_types.Shard(dim=0)], run_check = False);  l_x_ = None
    redistribute = dt.redistribute(device_mesh = l_mesh_, placements = [torch.distributed.tensor.placement_types.Replicate()]);  dt = l_mesh_ = None
    to_local = redistribute.to_local();  redistribute = None
    add = to_local + 2;  to_local = None
    return (add,)
```
It takes in the DeviceMesh as an input (since it is marked as reference-type opaque object), and calls from_local directly (since it is marked as TorchInGraphFunctionVariable), and to_local/redistribute as call_methods on the tensors

The AOTAutograd graph decomposes the from_local/to_local/redistribute operations and looks like:
```
def forward(self, arg0_1, arg1_1):
    _to_copy = torch.ops.aten._to_copy.default(arg0_1, dtype = torch.float32, layout = torch.strided, device = device(type='cuda', index=0));  arg0_1 = None
    view = torch.ops.aten.view.default(_to_copy, [1]);  _to_copy = None
    all_gather_into_tensor = torch.ops._c10d_functional.all_gather_into_tensor.default(view, 2, '0');  view = None
    wait_tensor = torch.ops._c10d_functional.wait_tensor.default(all_gather_into_tensor);  all_gather_into_tensor = None
    view_1 = torch.ops.aten.view.default(wait_tensor, [2]);  wait_tensor = None
    add = torch.ops.aten.add.Tensor(view_1, 2);  view_1 = None
    return (add,)
```




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo Lucaskabela ezyang

[ghstack-poisoned]
angelayi added a commit that referenced this pull request Feb 24, 2026
ghstack-source-id: db1d360
Pull Request resolved: #169867
By marking DTensorSpec as being a value-type opaque object, the dynamo graph now looks like:
```python
def forward(self, L_x_ : torch.Tensor, L_mesh_ : torch.distributed.device_mesh.DeviceMesh):
    l_x_ = L_x_
    l_mesh_ = L_mesh_
    dt = torch.distributed.tensor._api.from_local(l_x_, l_mesh_, [torch.distributed.tensor.placement_types.Shard(dim=0)], run_check = False);  l_x_ = None
    redistribute = dt.redistribute(device_mesh = l_mesh_, placements = [torch.distributed.tensor.placement_types.Replicate()]);  dt = l_mesh_ = None
    to_local = redistribute.to_local();  redistribute = None
    add = to_local + 2;  to_local = None
    return (add,)
```
It takes in the DeviceMesh as an input (since it is marked as reference-type opaque object), and calls from_local directly (since it is marked as TorchInGraphFunctionVariable), and to_local/redistribute as call_methods on the tensors

The AOTAutograd graph decomposes the from_local/to_local/redistribute operations and looks like:
```
def forward(self, arg0_1, arg1_1):
    _to_copy = torch.ops.aten._to_copy.default(arg0_1, dtype = torch.float32, layout = torch.strided, device = device(type='cuda', index=0));  arg0_1 = None
    view = torch.ops.aten.view.default(_to_copy, [1]);  _to_copy = None
    all_gather_into_tensor = torch.ops._c10d_functional.all_gather_into_tensor.default(view, 2, '0');  view = None
    wait_tensor = torch.ops._c10d_functional.wait_tensor.default(all_gather_into_tensor);  all_gather_into_tensor = None
    view_1 = torch.ops.aten.view.default(wait_tensor, [2]);  wait_tensor = None
    add = torch.ops.aten.add.Tensor(view_1, 2);  view_1 = None
    return (add,)
```




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo Lucaskabela ezyang

[ghstack-poisoned]
angelayi added a commit that referenced this pull request Feb 24, 2026
ghstack-source-id: 2849a09
Pull Request resolved: #169867
@angelayi
Copy link
Copy Markdown
Contributor Author

@angelayi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@angelayi
Copy link
Copy Markdown
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot added a commit that referenced this pull request Mar 4, 2026
This reverts commit 4416f11.

Reverted #175510 on behalf of https://github.com/huydhn due to Per our discussion with @angelayi, revert this so that #169867 can be reverted, it is breaking a bunch of internal tests ([comment](#175510 (comment)))
pytorchmergebot referenced this pull request Mar 4, 2026
Unsure whats the best way to fix things but this seems to work!

Pull Request resolved: #175510
Approved by: https://github.com/azahed98
@huydhn
Copy link
Copy Markdown
Contributor

huydhn commented Mar 4, 2026

@pytorchbot revert -m 'Per our discussion with @angelayi, revert this as it is breaking a bunch of internal tests' -c ghfirst

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Reverting PR 169867 failed

Reason: Command git -C /home/runner/work/pytorch/pytorch revert --no-edit c68fcaa704f9be38de29d063b680393c987aaa7f returned non-zero exit code 1

Auto-merging test/distributed/tensor/test_dtensor_compile.py
Auto-merging test/test_opaque_obj_v2.py
Auto-merging torch/_dynamo/debug_utils.py
CONFLICT (content): Merge conflict in torch/_dynamo/debug_utils.py
Auto-merging torch/_dynamo/functional_export.py
CONFLICT (content): Merge conflict in torch/_dynamo/functional_export.py
Auto-merging torch/_dynamo/guards.py
Auto-merging torch/_dynamo/variables/builder.py
CONFLICT (content): Merge conflict in torch/_dynamo/variables/builder.py
Auto-merging torch/_dynamo/variables/script_object.py
Auto-merging torch/_dynamo/variables/tensor.py
Auto-merging torch/_dynamo/variables/torch.py
Auto-merging torch/_inductor/constant_folding.py
Auto-merging torch/_library/fake_class_registry.py
Auto-merging torch/distributed/_functional_collectives.py
Auto-merging torch/distributed/device_mesh.py
Auto-merging torch/distributed/tensor/_api.py
Auto-merging torch/fx/node.py
error: could not revert c68fcaa704f... Make DeviceMesh opaque (#169867)
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git revert --continue".
hint: You can instead skip this commit with "git revert --skip".
hint: To abort and get back to the state before "git revert",
hint: run "git revert --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Details for Dev Infra team Raised by workflow job

angelayi added a commit that referenced this pull request Mar 4, 2026
@angelayi angelayi mentioned this pull request Mar 4, 2026
angelayi added a commit that referenced this pull request Mar 4, 2026
pytorchmergebot added a commit to anatoliylitv/pytorch that referenced this pull request Mar 4, 2026
This reverts commit 4416f11.

Reverted pytorch#175510 on behalf of https://github.com/huydhn due to Per our discussion with @angelayi, revert this so that pytorch#169867 can be reverted, it is breaking a bunch of internal tests ([comment](pytorch#175510 (comment)))
pytorchmergebot pushed a commit that referenced this pull request Mar 5, 2026
Reverts DeviceMesh tracing changes in #169867

Pull Request resolved: #176485
Approved by: https://github.com/huydhn
Vighaneshs pushed a commit to Vighaneshs/pytorch that referenced this pull request Mar 5, 2026
Reverts DeviceMesh tracing changes in pytorch#169867

Pull Request resolved: pytorch#176485
Approved by: https://github.com/huydhn
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
By marking DTensorSpec as being a value-type opaque object, the dynamo graph now looks like:
```python
def forward(self, L_x_ : torch.Tensor, L_mesh_ : torch.distributed.device_mesh.DeviceMesh):
    l_x_ = L_x_
    l_mesh_ = L_mesh_
    dt = torch.distributed.tensor._api.from_local(l_x_, l_mesh_, [torch.distributed.tensor.placement_types.Shard(dim=0)], run_check = False);  l_x_ = None
    redistribute = dt.redistribute(device_mesh = l_mesh_, placements = [torch.distributed.tensor.placement_types.Replicate()]);  dt = l_mesh_ = None
    to_local = redistribute.to_local();  redistribute = None
    add = to_local + 2;  to_local = None
    return (add,)
```
It takes in the DeviceMesh as an input (since it is marked as reference-type opaque object), and calls from_local directly (since it is marked as TorchInGraphFunctionVariable), and to_local/redistribute as call_methods on the tensors

The AOTAutograd graph decomposes the from_local/to_local/redistribute operations and looks like:
```
def forward(self, arg0_1, arg1_1):
    _to_copy = torch.ops.aten._to_copy.default(arg0_1, dtype = torch.float32, layout = torch.strided, device = device(type='cuda', index=0));  arg0_1 = None
    view = torch.ops.aten.view.default(_to_copy, [1]);  _to_copy = None
    all_gather_into_tensor = torch.ops._c10d_functional.all_gather_into_tensor.default(view, 2, '0');  view = None
    wait_tensor = torch.ops._c10d_functional.wait_tensor.default(all_gather_into_tensor);  all_gather_into_tensor = None
    view_1 = torch.ops.aten.view.default(wait_tensor, [2]);  wait_tensor = None
    add = torch.ops.aten.add.Tensor(view_1, 2);  view_1 = None
    return (add,)
```

Differential Revision: [D94288113](https://our.internmc.facebook.com/intern/diff/D94288113)
Pull Request resolved: pytorch#169867
Approved by: https://github.com/ezyang
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
This reverts commit 4416f11.

Reverted pytorch#175510 on behalf of https://github.com/huydhn due to Per our discussion with @angelayi, revert this so that pytorch#169867 can be reverted, it is breaking a bunch of internal tests ([comment](pytorch#175510 (comment)))
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
Reverts DeviceMesh tracing changes in pytorch#169867

Pull Request resolved: pytorch#176485
Approved by: https://github.com/huydhn
@github-actions github-actions Bot deleted the gh/angelayi/147/head branch April 4, 2026 02:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request keep-going Don't stop on first failure, keep running tests until the end Merged module: dynamo module: inductor release notes: distributed (dtensor) release notes category topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants