[coor-slicing] DeviceMesh.is_current_rank_part_of_mesh by aorenste · Pull Request #169548 · pytorch/pytorch

aorenste · 2025-12-04T06:28:45Z

Adds two methods to DeviceMesh:

is_current_rank_part_of_mesh
There are a number of places where we only care if our current rank is part of the DeviceMesh but don't actually care about other information. So instead of getting all the mesh coordinates and checking for None we can just have a predicate that says whether we are part of the rank or not.
sym_get_coordinate
Morally equivalent to get_coordinate()[i] - Instead of getting all the mesh coordinates as a list and extracting the one we want this allows specifying the rank and just getting the specific coordinate for that rank. Right now it only returns int but in the future can also return a SymInt.

Today both of these are a simple lookup in the _coordinate_on_dim array but a later PR will specialize them to properly limit their scope to make compile-on-one-rank happier.

Also the LocalTensorMode ad-hoc method patching was becoming unwieldy (and I had to add more items to it) so I made it automated based off a list instead of easy to miss one-offs.

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @chauhang @amjames @Lucaskabela @jataylo

[ghstack-poisoned]

pytorch-bot · 2025-12-04T06:28:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/169548

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e2055ee with merge base d26db89 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

ghstack-source-id: c24ef48 Pull Request resolved: pytorch/pytorch#169548

[ghstack-poisoned]

Adds two methods to DeviceMesh: - `is_current_rank_part_of_mesh` There are a number of places where we only care if our current rank is part of the DeviceMesh but don't actually care about other information. So instead of getting all the mesh coordinates and checking for `None` we can just have a predicate that says whether we are part of the rank or not. - `sym_get_coordinate` Morally equivalent to `get_coordinate()[i]` - Instead of getting all the mesh coordinates as a list and extracting the one we want this allows specifying the rank and just getting the specific coordinate for that rank. Right now it only returns `int` but in the future can also return a `SymInt`. Today both of these are a simple lookup in the `_coordinate_on_dim` array but a later PR will specialize them to properly limit their scope to make compile-on-one-rank happier. [ghstack-poisoned]

ghstack-source-id: cd33ca4 Pull Request resolved: #169548

Adds two methods to DeviceMesh: - `is_current_rank_part_of_mesh` There are a number of places where we only care if our current rank is part of the DeviceMesh but don't actually care about other information. So instead of getting all the mesh coordinates and checking for `None` we can just have a predicate that says whether we are part of the rank or not. - `sym_get_coordinate` Morally equivalent to `get_coordinate()[i]` - Instead of getting all the mesh coordinates as a list and extracting the one we want this allows specifying the rank and just getting the specific coordinate for that rank. Right now it only returns `int` but in the future can also return a `SymInt`. Today both of these are a simple lookup in the `_coordinate_on_dim` array but a later PR will specialize them to properly limit their scope to make compile-on-one-rank happier. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo [ghstack-poisoned]

ghstack-source-id: d85ab1a Pull Request resolved: #169548

Adds two methods to DeviceMesh: - `is_current_rank_part_of_mesh` There are a number of places where we only care if our current rank is part of the DeviceMesh but don't actually care about other information. So instead of getting all the mesh coordinates and checking for `None` we can just have a predicate that says whether we are part of the rank or not. - `sym_get_coordinate` Morally equivalent to `get_coordinate()[i]` - Instead of getting all the mesh coordinates as a list and extracting the one we want this allows specifying the rank and just getting the specific coordinate for that rank. Right now it only returns `int` but in the future can also return a `SymInt`. Today both of these are a simple lookup in the `_coordinate_on_dim` array but a later PR will specialize them to properly limit their scope to make compile-on-one-rank happier. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo [ghstack-poisoned]

aorenste · 2026-01-09T04:17:11Z

@pytorchbot merge

pytorchmergebot · 2026-01-09T04:19:07Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Adds two methods to DeviceMesh: - `is_current_rank_part_of_mesh` There are a number of places where we only care if our current rank is part of the DeviceMesh but don't actually care about other information. So instead of getting all the mesh coordinates and checking for `None` we can just have a predicate that says whether we are part of the rank or not. - `sym_get_coordinate` Morally equivalent to `get_coordinate()[i]` - Instead of getting all the mesh coordinates as a list and extracting the one we want this allows specifying the rank and just getting the specific coordinate for that rank. Right now it only returns `int` but in the future can also return a `SymInt`. Today both of these are a simple lookup in the `_coordinate_on_dim` array but a later PR will specialize them to properly limit their scope to make compile-on-one-rank happier. Also the LocalTensorMode ad-hoc method patching was becoming unwieldy (and I had to add more items to it) so I made it automated based off a list instead of easy to miss one-offs. Pull Request resolved: pytorch#169548 Approved by: https://github.com/zpcore, https://github.com/bobrenjc93

@aorenste

Previously, ranks not participating in redistribution would hit an assert in redistribution planner that the rank was participating. The assert in question was added recently in #169548 by @aorenste, and i'm not sure if patching an early exit in this PR is the best fix or rethinking the original assert. Also cc @pianpwk for discussion [ghstack-poisoned]

…n non-participating ranks" Previously, ranks not participating in redistribution would hit an assert in redistribution planner that the rank was participating. The assert in question was added recently in #169548 by aorenste, and i'm not sure if patching an early exit in this PR is the best fix or rethinking the original assert. Also cc pianpwk for discussion This PR is to fix an error happening on this test: ``` with_comms def test_from_local_sub_mesh(self): mesh = DeviceMesh(self.device_type, [0, 2]) local_tensor = torch.ones(3, 4) dtensor = DTensor.from_local(local_tensor, mesh, [Shard(0)]) self.assertEqual(dtensor.size(), torch.Size([6, 4])) self.sub_mesh_assert_equal( mesh.mesh, torch.ones(3, 4), torch.tensor([]), dtensor.to_local(), ) # test dtensor created in submesh, the operation should only # be applied to the local shard inside the mesh, not the whole # world, so only 0/2 really run the computation dtensor = dtensor + 2 self.sub_mesh_assert_equal( mesh.mesh, torch.ones(3, 4) + 2, torch.tensor([]), dtensor.to_local(), ) ``` After looking at the test, I am very confused about why we support this behavior in the first place. aorenste suggested maybe we should just make DTensor.from_local error out on ranks that aren't included in the mesh. I am not sure why we want to allow python code to run on non-participating ranks, and go through dtensor dispatch, and return a dtensor object that is defunct. Claude summarized the behavior: * we run shard prop on and shape prop on every (including non-participating) rank: ``` | Question | Answer | |----------------------------------------|----------------------------------------------------------| | Value of dtensor + 2 on excluded ranks | Empty tensor torch.tensor([]) | | Has global shape? | Yes - dtensor.size() returns (6, 4) | | Has placements? | Yes - same as participating ranks | | Runs shape propagation? | Yes - output spec is computed, just no local computation | The design ensures all ranks can query DTensor properties consistently while only participating ranks do actual computation. ``` [ghstack-poisoned]

…ing ranks" Previously, ranks not participating in redistribution would hit an assert in redistribution planner that the rank was participating. The assert in question was added recently in #169548 by aorenste, and i'm not sure if patching an early exit in this PR is the best fix or rethinking the original assert. Also cc pianpwk for discussion This PR is to fix an error happening on this test: ``` with_comms def test_from_local_sub_mesh(self): mesh = DeviceMesh(self.device_type, [0, 2]) local_tensor = torch.ones(3, 4) dtensor = DTensor.from_local(local_tensor, mesh, [Shard(0)]) self.assertEqual(dtensor.size(), torch.Size([6, 4])) self.sub_mesh_assert_equal( mesh.mesh, torch.ones(3, 4), torch.tensor([]), dtensor.to_local(), ) # test dtensor created in submesh, the operation should only # be applied to the local shard inside the mesh, not the whole # world, so only 0/2 really run the computation dtensor = dtensor + 2 self.sub_mesh_assert_equal( mesh.mesh, torch.ones(3, 4) + 2, torch.tensor([]), dtensor.to_local(), ) ``` After looking at the test, I am very confused about why we support this behavior in the first place. aorenste suggested maybe we should just make DTensor.from_local error out on ranks that aren't included in the mesh. I am not sure why we want to allow python code to run on non-participating ranks, and go through dtensor dispatch, and return a dtensor object that is defunct. Claude summarized the behavior: * we run shard prop on and shape prop on every (including non-participating) rank: ``` | Question | Answer | |----------------------------------------|----------------------------------------------------------| | Value of dtensor + 2 on excluded ranks | Empty tensor torch.tensor([]) | | Has global shape? | Yes - dtensor.size() returns (6, 4) | | Has placements? | Yes - same as participating ranks | | Runs shape propagation? | Yes - output spec is computed, just no local computation | The design ensures all ranks can query DTensor properties consistently while only participating ranks do actual computation. ``` [ghstack-poisoned]

…n non-participating ranks" Previously, ranks not participating in redistribution would hit an assert in redistribution planner that the rank was participating. The assert in question was added recently in #169548 by aorenste, and i'm not sure if patching an early exit in this PR is the best fix or rethinking the original assert. Also cc pianpwk for discussion This PR is to fix an error happening on this test: ``` with_comms def test_from_local_sub_mesh(self): mesh = DeviceMesh(self.device_type, [0, 2]) local_tensor = torch.ones(3, 4) dtensor = DTensor.from_local(local_tensor, mesh, [Shard(0)]) self.assertEqual(dtensor.size(), torch.Size([6, 4])) self.sub_mesh_assert_equal( mesh.mesh, torch.ones(3, 4), torch.tensor([]), dtensor.to_local(), ) # test dtensor created in submesh, the operation should only # be applied to the local shard inside the mesh, not the whole # world, so only 0/2 really run the computation dtensor = dtensor + 2 self.sub_mesh_assert_equal( mesh.mesh, torch.ones(3, 4) + 2, torch.tensor([]), dtensor.to_local(), ) ``` After looking at the test, I am very confused about why we support this behavior in the first place. aorenste suggested maybe we should just make DTensor.from_local error out on ranks that aren't included in the mesh. I am not sure why we want to allow python code to run on non-participating ranks, and go through dtensor dispatch, and return a dtensor object that is defunct. Claude summarized the behavior: * we run shard prop on and shape prop on every (including non-participating) rank: ``` | Question | Answer | |----------------------------------------|----------------------------------------------------------| | Value of dtensor + 2 on excluded ranks | Empty tensor torch.tensor([]) | | Has global shape? | Yes - dtensor.size() returns (6, 4) | | Has placements? | Yes - same as participating ranks | | Runs shape propagation? | Yes - output spec is computed, just no local computation | The design ensures all ranks can query DTensor properties consistently while only participating ranks do actual computation. ``` [ghstack-poisoned]

…ing ranks" Previously, ranks not participating in redistribution would hit an assert in redistribution planner that the rank was participating. The assert in question was added recently in #169548 by aorenste, and i'm not sure if patching an early exit in this PR is the best fix or rethinking the original assert. Also cc pianpwk for discussion This PR is to fix an error happening on this test: ``` with_comms def test_from_local_sub_mesh(self): mesh = DeviceMesh(self.device_type, [0, 2]) local_tensor = torch.ones(3, 4) dtensor = DTensor.from_local(local_tensor, mesh, [Shard(0)]) self.assertEqual(dtensor.size(), torch.Size([6, 4])) self.sub_mesh_assert_equal( mesh.mesh, torch.ones(3, 4), torch.tensor([]), dtensor.to_local(), ) # test dtensor created in submesh, the operation should only # be applied to the local shard inside the mesh, not the whole # world, so only 0/2 really run the computation dtensor = dtensor + 2 self.sub_mesh_assert_equal( mesh.mesh, torch.ones(3, 4) + 2, torch.tensor([]), dtensor.to_local(), ) ``` After looking at the test, I am very confused about why we support this behavior in the first place. aorenste suggested maybe we should just make DTensor.from_local error out on ranks that aren't included in the mesh. I am not sure why we want to allow python code to run on non-participating ranks, and go through dtensor dispatch, and return a dtensor object that is defunct. Claude summarized the behavior: * we run shard prop on and shape prop on every (including non-participating) rank: ``` | Question | Answer | |----------------------------------------|----------------------------------------------------------| | Value of dtensor + 2 on excluded ranks | Empty tensor torch.tensor([]) | | Has global shape? | Yes - dtensor.size() returns (6, 4) | | Has placements? | Yes - same as participating ranks | | Runs shape propagation? | Yes - output spec is computed, just no local computation | The design ensures all ranks can query DTensor properties consistently while only participating ranks do actual computation. ``` [ghstack-poisoned]

…n non-participating ranks" Previously, ranks not participating in redistribution would hit an assert in redistribution planner that the rank was participating. The assert in question was added recently in #169548 by aorenste, and i'm not sure if patching an early exit in this PR is the best fix or rethinking the original assert. Also cc pianpwk for discussion This PR is to fix an error happening on this test: ``` with_comms def test_from_local_sub_mesh(self): mesh = DeviceMesh(self.device_type, [0, 2]) local_tensor = torch.ones(3, 4) dtensor = DTensor.from_local(local_tensor, mesh, [Shard(0)]) self.assertEqual(dtensor.size(), torch.Size([6, 4])) self.sub_mesh_assert_equal( mesh.mesh, torch.ones(3, 4), torch.tensor([]), dtensor.to_local(), ) # test dtensor created in submesh, the operation should only # be applied to the local shard inside the mesh, not the whole # world, so only 0/2 really run the computation dtensor = dtensor + 2 self.sub_mesh_assert_equal( mesh.mesh, torch.ones(3, 4) + 2, torch.tensor([]), dtensor.to_local(), ) ``` After looking at the test, I am very confused about why we support this behavior in the first place. aorenste suggested maybe we should just make DTensor.from_local error out on ranks that aren't included in the mesh. I am not sure why we want to allow python code to run on non-participating ranks, and go through dtensor dispatch, and return a dtensor object that is defunct. Claude summarized the behavior: * we run shard prop on and shape prop on every (including non-participating) rank: ``` | Question | Answer | |----------------------------------------|----------------------------------------------------------| | Value of dtensor + 2 on excluded ranks | Empty tensor torch.tensor([]) | | Has global shape? | Yes - dtensor.size() returns (6, 4) | | Has placements? | Yes - same as participating ranks | | Runs shape propagation? | Yes - output spec is computed, just no local computation | The design ensures all ranks can query DTensor properties consistently while only participating ranks do actual computation. ``` [ghstack-poisoned]

…ing ranks" Previously, ranks not participating in redistribution would hit an assert in redistribution planner that the rank was participating. The assert in question was added recently in #169548 by aorenste, and i'm not sure if patching an early exit in this PR is the best fix or rethinking the original assert. Also cc pianpwk for discussion This PR is to fix an error happening on this test: ``` with_comms def test_from_local_sub_mesh(self): mesh = DeviceMesh(self.device_type, [0, 2]) local_tensor = torch.ones(3, 4) dtensor = DTensor.from_local(local_tensor, mesh, [Shard(0)]) self.assertEqual(dtensor.size(), torch.Size([6, 4])) self.sub_mesh_assert_equal( mesh.mesh, torch.ones(3, 4), torch.tensor([]), dtensor.to_local(), ) # test dtensor created in submesh, the operation should only # be applied to the local shard inside the mesh, not the whole # world, so only 0/2 really run the computation dtensor = dtensor + 2 self.sub_mesh_assert_equal( mesh.mesh, torch.ones(3, 4) + 2, torch.tensor([]), dtensor.to_local(), ) ``` After looking at the test, I am very confused about why we support this behavior in the first place. aorenste suggested maybe we should just make DTensor.from_local error out on ranks that aren't included in the mesh. I am not sure why we want to allow python code to run on non-participating ranks, and go through dtensor dispatch, and return a dtensor object that is defunct. Claude summarized the behavior: * we run shard prop on and shape prop on every (including non-participating) rank: ``` | Question | Answer | |----------------------------------------|----------------------------------------------------------| | Value of dtensor + 2 on excluded ranks | Empty tensor torch.tensor([]) | | Has global shape? | Yes - dtensor.size() returns (6, 4) | | Has placements? | Yes - same as participating ranks | | Runs shape propagation? | Yes - output spec is computed, just no local computation | The design ensures all ranks can query DTensor properties consistently while only participating ranks do actual computation. ``` [ghstack-poisoned]

ghstack-source-id: 8166443 Pull Request resolved: pytorch/pytorch#169548

ghstack-source-id: 6b55006 Pull Request resolved: pytorch/pytorch#169548

…n non-participating ranks" Previously, ranks not participating in redistribution would hit an assert in redistribution planner that the rank was participating. The assert in question was added recently in #169548 by aorenste, and i'm not sure if patching an early exit in this PR is the best fix or rethinking the original assert. Also cc pianpwk for discussion This PR is to fix an error happening on this test: ``` with_comms def test_from_local_sub_mesh(self): mesh = DeviceMesh(self.device_type, [0, 2]) local_tensor = torch.ones(3, 4) dtensor = DTensor.from_local(local_tensor, mesh, [Shard(0)]) self.assertEqual(dtensor.size(), torch.Size([6, 4])) self.sub_mesh_assert_equal( mesh.mesh, torch.ones(3, 4), torch.tensor([]), dtensor.to_local(), ) # test dtensor created in submesh, the operation should only # be applied to the local shard inside the mesh, not the whole # world, so only 0/2 really run the computation dtensor = dtensor + 2 self.sub_mesh_assert_equal( mesh.mesh, torch.ones(3, 4) + 2, torch.tensor([]), dtensor.to_local(), ) ``` After looking at the test, I am very confused about why we support this behavior in the first place. aorenste suggested maybe we should just make DTensor.from_local error out on ranks that aren't included in the mesh. I am not sure why we want to allow python code to run on non-participating ranks, and go through dtensor dispatch, and return a dtensor object that is defunct. Claude summarized the behavior: * we run shard prop on and shape prop on every (including non-participating) rank: ``` | Question | Answer | |----------------------------------------|----------------------------------------------------------| | Value of dtensor + 2 on excluded ranks | Empty tensor torch.tensor([]) | | Has global shape? | Yes - dtensor.size() returns (6, 4) | | Has placements? | Yes - same as participating ranks | | Runs shape propagation? | Yes - output spec is computed, just no local computation | The design ensures all ranks can query DTensor properties consistently while only participating ranks do actual computation. ``` [ghstack-poisoned]

…ing ranks" Previously, ranks not participating in redistribution would hit an assert in redistribution planner that the rank was participating. The assert in question was added recently in #169548 by aorenste, and i'm not sure if patching an early exit in this PR is the best fix or rethinking the original assert. Also cc pianpwk for discussion This PR is to fix an error happening on this test: ``` with_comms def test_from_local_sub_mesh(self): mesh = DeviceMesh(self.device_type, [0, 2]) local_tensor = torch.ones(3, 4) dtensor = DTensor.from_local(local_tensor, mesh, [Shard(0)]) self.assertEqual(dtensor.size(), torch.Size([6, 4])) self.sub_mesh_assert_equal( mesh.mesh, torch.ones(3, 4), torch.tensor([]), dtensor.to_local(), ) # test dtensor created in submesh, the operation should only # be applied to the local shard inside the mesh, not the whole # world, so only 0/2 really run the computation dtensor = dtensor + 2 self.sub_mesh_assert_equal( mesh.mesh, torch.ones(3, 4) + 2, torch.tensor([]), dtensor.to_local(), ) ``` After looking at the test, I am very confused about why we support this behavior in the first place. aorenste suggested maybe we should just make DTensor.from_local error out on ranks that aren't included in the mesh. I am not sure why we want to allow python code to run on non-participating ranks, and go through dtensor dispatch, and return a dtensor object that is defunct. Claude summarized the behavior: * we run shard prop on and shape prop on every (including non-participating) rank: ``` | Question | Answer | |----------------------------------------|----------------------------------------------------------| | Value of dtensor + 2 on excluded ranks | Empty tensor torch.tensor([]) | | Has global shape? | Yes - dtensor.size() returns (6, 4) | | Has placements? | Yes - same as participating ranks | | Runs shape propagation? | Yes - output spec is computed, just no local computation | The design ensures all ranks can query DTensor properties consistently while only participating ranks do actual computation. ``` [ghstack-poisoned]

…n non-participating ranks" Previously, ranks not participating in redistribution would hit an assert in redistribution planner that the rank was participating. The assert in question was added recently in #169548 by aorenste, and i'm not sure if patching an early exit in this PR is the best fix or rethinking the original assert. Also cc pianpwk for discussion This PR is to fix an error happening on this test: ``` with_comms def test_from_local_sub_mesh(self): mesh = DeviceMesh(self.device_type, [0, 2]) local_tensor = torch.ones(3, 4) dtensor = DTensor.from_local(local_tensor, mesh, [Shard(0)]) self.assertEqual(dtensor.size(), torch.Size([6, 4])) self.sub_mesh_assert_equal( mesh.mesh, torch.ones(3, 4), torch.tensor([]), dtensor.to_local(), ) # test dtensor created in submesh, the operation should only # be applied to the local shard inside the mesh, not the whole # world, so only 0/2 really run the computation dtensor = dtensor + 2 self.sub_mesh_assert_equal( mesh.mesh, torch.ones(3, 4) + 2, torch.tensor([]), dtensor.to_local(), ) ``` After looking at the test, I am very confused about why we support this behavior in the first place. aorenste suggested maybe we should just make DTensor.from_local error out on ranks that aren't included in the mesh. I am not sure why we want to allow python code to run on non-participating ranks, and go through dtensor dispatch, and return a dtensor object that is defunct. Claude summarized the behavior: * we run shard prop on and shape prop on every (including non-participating) rank: ``` | Question | Answer | |----------------------------------------|----------------------------------------------------------| | Value of dtensor + 2 on excluded ranks | Empty tensor torch.tensor([]) | | Has global shape? | Yes - dtensor.size() returns (6, 4) | | Has placements? | Yes - same as participating ranks | | Runs shape propagation? | Yes - output spec is computed, just no local computation | The design ensures all ranks can query DTensor properties consistently while only participating ranks do actual computation. ``` [ghstack-poisoned]

…ing ranks" Previously, ranks not participating in redistribution would hit an assert in redistribution planner that the rank was participating. The assert in question was added recently in #169548 by aorenste, and i'm not sure if patching an early exit in this PR is the best fix or rethinking the original assert. Also cc pianpwk for discussion This PR is to fix an error happening on this test: ``` with_comms def test_from_local_sub_mesh(self): mesh = DeviceMesh(self.device_type, [0, 2]) local_tensor = torch.ones(3, 4) dtensor = DTensor.from_local(local_tensor, mesh, [Shard(0)]) self.assertEqual(dtensor.size(), torch.Size([6, 4])) self.sub_mesh_assert_equal( mesh.mesh, torch.ones(3, 4), torch.tensor([]), dtensor.to_local(), ) # test dtensor created in submesh, the operation should only # be applied to the local shard inside the mesh, not the whole # world, so only 0/2 really run the computation dtensor = dtensor + 2 self.sub_mesh_assert_equal( mesh.mesh, torch.ones(3, 4) + 2, torch.tensor([]), dtensor.to_local(), ) ``` After looking at the test, I am very confused about why we support this behavior in the first place. aorenste suggested maybe we should just make DTensor.from_local error out on ranks that aren't included in the mesh. I am not sure why we want to allow python code to run on non-participating ranks, and go through dtensor dispatch, and return a dtensor object that is defunct. Claude summarized the behavior: * we run shard prop on and shape prop on every (including non-participating) rank: ``` | Question | Answer | |----------------------------------------|----------------------------------------------------------| | Value of dtensor + 2 on excluded ranks | Empty tensor torch.tensor([]) | | Has global shape? | Yes - dtensor.size() returns (6, 4) | | Has placements? | Yes - same as participating ranks | | Runs shape propagation? | Yes - output spec is computed, just no local computation | The design ensures all ranks can query DTensor properties consistently while only participating ranks do actual computation. ``` [ghstack-poisoned]

…n non-participating ranks" Previously, ranks not participating in redistribution would hit an assert in redistribution planner that the rank was participating. The assert in question was added recently in #169548 by aorenste, and i'm not sure if patching an early exit in this PR is the best fix or rethinking the original assert. Also cc pianpwk for discussion This PR is to fix an error happening on this test: ``` with_comms def test_from_local_sub_mesh(self): mesh = DeviceMesh(self.device_type, [0, 2]) local_tensor = torch.ones(3, 4) dtensor = DTensor.from_local(local_tensor, mesh, [Shard(0)]) self.assertEqual(dtensor.size(), torch.Size([6, 4])) self.sub_mesh_assert_equal( mesh.mesh, torch.ones(3, 4), torch.tensor([]), dtensor.to_local(), ) # test dtensor created in submesh, the operation should only # be applied to the local shard inside the mesh, not the whole # world, so only 0/2 really run the computation dtensor = dtensor + 2 self.sub_mesh_assert_equal( mesh.mesh, torch.ones(3, 4) + 2, torch.tensor([]), dtensor.to_local(), ) ``` After looking at the test, I am very confused about why we support this behavior in the first place. aorenste suggested maybe we should just make DTensor.from_local error out on ranks that aren't included in the mesh. I am not sure why we want to allow python code to run on non-participating ranks, and go through dtensor dispatch, and return a dtensor object that is defunct. Claude summarized the behavior: * we run shard prop on and shape prop on every (including non-participating) rank: ``` | Question | Answer | |----------------------------------------|----------------------------------------------------------| | Value of dtensor + 2 on excluded ranks | Empty tensor torch.tensor([]) | | Has global shape? | Yes - dtensor.size() returns (6, 4) | | Has placements? | Yes - same as participating ranks | | Runs shape propagation? | Yes - output spec is computed, just no local computation | The design ensures all ranks can query DTensor properties consistently while only participating ranks do actual computation. ``` [ghstack-poisoned]

…ing ranks" Previously, ranks not participating in redistribution would hit an assert in redistribution planner that the rank was participating. The assert in question was added recently in #169548 by aorenste, and i'm not sure if patching an early exit in this PR is the best fix or rethinking the original assert. Also cc pianpwk for discussion This PR is to fix an error happening on this test: ``` with_comms def test_from_local_sub_mesh(self): mesh = DeviceMesh(self.device_type, [0, 2]) local_tensor = torch.ones(3, 4) dtensor = DTensor.from_local(local_tensor, mesh, [Shard(0)]) self.assertEqual(dtensor.size(), torch.Size([6, 4])) self.sub_mesh_assert_equal( mesh.mesh, torch.ones(3, 4), torch.tensor([]), dtensor.to_local(), ) # test dtensor created in submesh, the operation should only # be applied to the local shard inside the mesh, not the whole # world, so only 0/2 really run the computation dtensor = dtensor + 2 self.sub_mesh_assert_equal( mesh.mesh, torch.ones(3, 4) + 2, torch.tensor([]), dtensor.to_local(), ) ``` After looking at the test, I am very confused about why we support this behavior in the first place. aorenste suggested maybe we should just make DTensor.from_local error out on ranks that aren't included in the mesh. I am not sure why we want to allow python code to run on non-participating ranks, and go through dtensor dispatch, and return a dtensor object that is defunct. Claude summarized the behavior: * we run shard prop on and shape prop on every (including non-participating) rank: ``` | Question | Answer | |----------------------------------------|----------------------------------------------------------| | Value of dtensor + 2 on excluded ranks | Empty tensor torch.tensor([]) | | Has global shape? | Yes - dtensor.size() returns (6, 4) | | Has placements? | Yes - same as participating ranks | | Runs shape propagation? | Yes - output spec is computed, just no local computation | The design ensures all ranks can query DTensor properties consistently while only participating ranks do actual computation. ``` [ghstack-poisoned]

…n non-participating ranks" Previously, ranks not participating in redistribution would hit an assert in redistribution planner that the rank was participating. The assert in question was added recently in #169548 by aorenste, and i'm not sure if patching an early exit in this PR is the best fix or rethinking the original assert. Also cc pianpwk for discussion [ghstack-poisoned]

…ing ranks" Previously, ranks not participating in redistribution would hit an assert in redistribution planner that the rank was participating. The assert in question was added recently in #169548 by aorenste, and i'm not sure if patching an early exit in this PR is the best fix or rethinking the original assert. Also cc pianpwk for discussion [ghstack-poisoned]

…172478) Previously, ranks not participating in redistribution would hit an assert in redistribution planner that the rank was participating. The assert in question was added recently in #169548 by aorenste, and i'm not sure if patching an early exit in this PR is the best fix or rethinking the original assert. Also cc pianpwk for discussion Pull Request resolved: #172478 Approved by: https://github.com/pianpwk

…ytorch#172478) Previously, ranks not participating in redistribution would hit an assert in redistribution planner that the rank was participating. The assert in question was added recently in pytorch#169548 by aorenste, and i'm not sure if patching an early exit in this PR is the best fix or rethinking the original assert. Also cc pianpwk for discussion Pull Request resolved: pytorch#172478 Approved by: https://github.com/pianpwk

WIP: DeviceMesh.is_part_of_mesh

01c52e1

[ghstack-poisoned]

pytorch-bot Bot added the ciflow/inductor label Dec 4, 2025

aorenste added 2 commits December 4, 2025 08:38

Update on "WIP: DeviceMesh.is_part_of_mesh"

ae8d633

[ghstack-poisoned]

Update on "WIP: DeviceMesh.is_part_of_mesh"

372f8c6

[ghstack-poisoned]

aorenste added the topic: not user facing topic category label Dec 7, 2025

aorenste added 2 commits December 7, 2025 19:59

Update on "WIP: DeviceMesh.is_part_of_mesh"

f626681

[ghstack-poisoned]

Update on "WIP: DeviceMesh.is_part_of_mesh"

5eac850

[ghstack-poisoned]

tiendatngcs pushed a commit to tiendatngcs/pytorch-Dec25 that referenced this pull request Dec 10, 2025

WIP: DeviceMesh.is_part_of_mesh

adccdf6

ghstack-source-id: c24ef48 Pull Request resolved: pytorch/pytorch#169548

Update on "WIP: DeviceMesh.is_part_of_mesh"

d72be49

[ghstack-poisoned]

aorenste changed the title ~~WIP: DeviceMesh.is_part_of_mesh~~ DeviceMesh.is_current_rank_part_of_mesh Dec 10, 2025

aorenste marked this pull request as ready for review December 10, 2025 15:35

aorenste requested review from bdhirsh, ezyang, fduwjj and wconstab December 10, 2025 15:36

aorenste added a commit that referenced this pull request Dec 19, 2025

WIP: DeviceMesh.is_part_of_mesh

b55e52e

ghstack-source-id: cd33ca4 Pull Request resolved: #169548

pytorch-bot Bot added the module: dynamo label Dec 19, 2025

aorenste added a commit that referenced this pull request Dec 19, 2025

WIP: DeviceMesh.is_part_of_mesh

2ff97c0

ghstack-source-id: d85ab1a Pull Request resolved: #169548

pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 9, 2026

pytorchmergebot added the merging label Jan 9, 2026

pytorchmergebot added the Merged label Jan 9, 2026

pytorchmergebot closed this in 64987e1 Jan 9, 2026

pytorchmergebot removed the merging label Jan 9, 2026

wconstab mentioned this pull request Jan 14, 2026

[DTensor] fix redistribute cost crashing on non-participating ranks #172478

Closed

SergeyTyshkevich pushed a commit to SergeyTyshkevich/chart2 that referenced this pull request Jan 19, 2026

[coor-slicing] DeviceMesh.is_part_of_mesh

583f058

ghstack-source-id: 8166443 Pull Request resolved: pytorch/pytorch#169548

SergeyTyshkevich pushed a commit to SergeyTyshkevich/chart2 that referenced this pull request Jan 19, 2026

WIP: DeviceMesh.is_part_of_mesh

32d8931

ghstack-source-id: 6b55006 Pull Request resolved: pytorch/pytorch#169548

github-actions Bot deleted the gh/aorenste/155/head branch February 9, 2026 02:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[coor-slicing] DeviceMesh.is_current_rank_part_of_mesh#169548

[coor-slicing] DeviceMesh.is_current_rank_part_of_mesh#169548
aorenste wants to merge 11 commits intogh/aorenste/155/basefrom
gh/aorenste/155/head

aorenste commented Dec 4, 2025 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Dec 4, 2025 •

edited

Loading

Uh oh!

aorenste commented Jan 9, 2026

Uh oh!

pytorchmergebot commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

aorenste commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/169548

✅ No Failures

Uh oh!

aorenste commented Jan 9, 2026

Uh oh!

pytorchmergebot commented Jan 9, 2026

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

aorenste commented Dec 4, 2025 •

edited

Loading

pytorch-bot Bot commented Dec 4, 2025 •

edited

Loading