Fixing get_local_rank() variable missing when compiled#165432
Fixing get_local_rank() variable missing when compiled#165432arkadip-maitra wants to merge 3 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165432
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit e1ffbdf with merge base 74db92b ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "topic: not user facing" |
|
Didn't find following labels among repository labels: module:DeviceMesh |
|
@pytorchbot label "module: DeviceMesh" |
| const_args = [x.as_python_constant() for x in args] | ||
| const_kwargs = {k: v.as_python_constant() for k, v in kwargs.items()} | ||
| return ConstantVariable.create( | ||
| self.value.get_local_rank(*const_args, **const_kwargs) |
There was a problem hiding this comment.
can you add a test? you can put it in test_dtensor_compile.py, similar to https://github.com/pytorch/pytorch/blob/main/test/distributed/tensor/test_dtensor_compile.py#L219 (the test should compile something like the below, and test that the output is the same under compile and eager)
def f(dtensor):
local_rank = dtensor.device_mesh.get_local_rank("dp")
return dtensor * local_rank
There was a problem hiding this comment.
we should also make test that this handles the mesh_dim argument being optional
There was a problem hiding this comment.
mesh_dim argument not being passed should fail? can you explain wdym?
we should also make test that this handles the
mesh_dimargument being optional
There was a problem hiding this comment.
can you add a test? you can put it in
test_dtensor_compile.py, similar to https://github.com/pytorch/pytorch/blob/main/test/distributed/tensor/test_dtensor_compile.py#L219 (the test should compile something like the below, and test that the output is the same under compile and eager)def f(dtensor): local_rank = dtensor.device_mesh.get_local_rank("dp") return dtensor * local_rank
Added test
There was a problem hiding this comment.
mesh_dim argument not being passed should fail? can you explain wdym?
in the eager API, it looks like you don't have to pass in a mesh_dim, we won't error as long as ndim == 1: https://github.com/pytorch/pytorch/blob/main/torch/distributed/device_mesh.py#L1050. So we should have a separate test for that
There was a problem hiding this comment.
mesh_dim argument not being passed should fail? can you explain wdym?
in the eager API, it looks like you don't have to pass in a mesh_dim, we won't error as long as
ndim == 1: https://github.com/pytorch/pytorch/blob/main/torch/distributed/device_mesh.py#L1050. So we should have a separate test for that
Thanks. Added that test case. It should be good now.
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Fixes pytorch#165215 Pull Request resolved: pytorch#165432 Approved by: https://github.com/bdhirsh
Fixes pytorch#165215 Pull Request resolved: pytorch#165432 Approved by: https://github.com/bdhirsh
Fixes #165215
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @Lucaskabela