[coor-slicing] extract _get_mesh_tensor_from_full_mesh#169550
[coor-slicing] extract _get_mesh_tensor_from_full_mesh#169550aorenste wants to merge 10 commits intogh/aorenste/157/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/169550
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 91 PendingAs of commit 159b331 with merge base dc48fef ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
ghstack-source-id: 2f72461 Pull Request resolved: pytorch/pytorch#169550
[ghstack-poisoned]
For compile on one rank we need to be able to compute the DeviceMesh rank Tensor based on the raw Tensor and current rank. So this PR factors out `DeviceMesh._get_mesh_tensor_from_full_mesh()` into a static method. [ghstack-poisoned]
For compile on one rank we need to be able to compute the DeviceMesh rank Tensor based on the raw Tensor and current rank. So this PR factors out `DeviceMesh._get_mesh_tensor_from_full_mesh()` into a static method. [ghstack-poisoned]
|
Starting merge as part of PR stack under #169551 |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 2 jobs have failed, first few of them are: trunk / linux-jammy-cuda13.0-py3.10-gcc11 / test (distributed, 1, 3, lf.linux.g4dn.12xlarge.nvidia.gpu), trunk / linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 1, 3, lf.linux.g4dn.12xlarge.nvidia.gpu) Details for Dev Infra teamRaised by workflow job |
For compile on one rank we need to be able to compute the DeviceMesh rank Tensor based on the raw Tensor and current rank. So this PR factors out `DeviceMesh._get_mesh_tensor_from_full_mesh()` into a static method. [ghstack-poisoned]
|
Starting merge as part of PR stack under #169551 |
`Placement._split_tensor()` computes and returns too much information - in general most callers call it and then throw away most of the results. Added `Placement._select_split_tensor()` which allows the caller to say which parts they want so we can compute only those bits - in essence it is the combination of `Placement._split_tensor()` and `Shard._select_shard()`. Pull Request resolved: #169551 Approved by: https://github.com/ezyang ghstack dependencies: #169549, #169550
|
@pytorchbot revert -m "seems to be breaking internal signals, see D90448078" -c ghfirst |
|
@pytorchbot successfully started a revert job. Check the current status here. |
|
@aorenste your PR has been successfully reverted. |
…)" This reverts commit 649d9b3. Reverted #169550 on behalf of https://github.com/jeanschmidt due to seems to be breaking internal signals, see D90448078 ([comment](#169550 (comment)))
|
Starting merge as part of PR stack under #169551 |
`Placement._split_tensor()` computes and returns too much information - in general most callers call it and then throw away most of the results. Added `Placement._select_split_tensor()` which allows the caller to say which parts they want so we can compute only those bits - in essence it is the combination of `Placement._split_tensor()` and `Shard._select_shard()`. Pull Request resolved: #169551 Approved by: https://github.com/ezyang ghstack dependencies: #169549, #169550
For compile on one rank we need to be able to compute the DeviceMesh rank Tensor based on the raw Tensor and current rank. So this PR factors out `DeviceMesh._get_mesh_tensor_from_full_mesh()` into a static method. Pull Request resolved: pytorch#169550 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#169549
`Placement._split_tensor()` computes and returns too much information - in general most callers call it and then throw away most of the results. Added `Placement._select_split_tensor()` which allows the caller to say which parts they want so we can compute only those bits - in essence it is the combination of `Placement._split_tensor()` and `Shard._select_shard()`. Pull Request resolved: pytorch#169551 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#169549, pytorch#169550
…ch#169550)" This reverts commit 649d9b3. Reverted pytorch#169550 on behalf of https://github.com/jeanschmidt due to seems to be breaking internal signals, see D90448078 ([comment](pytorch#169550 (comment)))
For compile on one rank we need to be able to compute the DeviceMesh rank Tensor based on the raw Tensor and current rank. So this PR factors out `DeviceMesh._get_mesh_tensor_from_full_mesh()` into a static method. Pull Request resolved: pytorch#169550 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#169549
`Placement._split_tensor()` computes and returns too much information - in general most callers call it and then throw away most of the results. Added `Placement._select_split_tensor()` which allows the caller to say which parts they want so we can compute only those bits - in essence it is the combination of `Placement._split_tensor()` and `Shard._select_shard()`. Pull Request resolved: pytorch#169551 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#169549, pytorch#169550
For compile on one rank we need to be able to compute the DeviceMesh rank Tensor based on the raw Tensor and current rank. So this PR factors out `DeviceMesh._get_mesh_tensor_from_full_mesh()` into a static method. Pull Request resolved: pytorch#169550 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#169549
`Placement._split_tensor()` computes and returns too much information - in general most callers call it and then throw away most of the results. Added `Placement._select_split_tensor()` which allows the caller to say which parts they want so we can compute only those bits - in essence it is the combination of `Placement._split_tensor()` and `Shard._select_shard()`. Pull Request resolved: pytorch#169551 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#169549, pytorch#169550
…ch#169550)" This reverts commit 649d9b3. Reverted pytorch#169550 on behalf of https://github.com/jeanschmidt due to seems to be breaking internal signals, see D90448078 ([comment](pytorch#169550 (comment)))
For compile on one rank we need to be able to compute the DeviceMesh rank Tensor based on the raw Tensor and current rank. So this PR factors out `DeviceMesh._get_mesh_tensor_from_full_mesh()` into a static method. Pull Request resolved: pytorch#169550 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#169549
`Placement._split_tensor()` computes and returns too much information - in general most callers call it and then throw away most of the results. Added `Placement._select_split_tensor()` which allows the caller to say which parts they want so we can compute only those bits - in essence it is the combination of `Placement._split_tensor()` and `Shard._select_shard()`. Pull Request resolved: pytorch#169551 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#169549, pytorch#169550
ghstack-source-id: 34bf40c Pull Request resolved: pytorch/pytorch#169550
ghstack-source-id: 96485cf Pull Request resolved: pytorch/pytorch#169550
For compile on one rank we need to be able to compute the DeviceMesh rank Tensor based on the raw Tensor and current rank. So this PR factors out
DeviceMesh._get_mesh_tensor_from_full_mesh()into a static method.Stack from ghstack (oldest at bottom):