Fix inplace ops on Partial DTensors to preserve aliasing semantics#164729
Fix inplace ops on Partial DTensors to preserve aliasing semantics#164729RohitRathore1 wants to merge 5 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164729
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit ec907a6 with merge base 573a79f ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
2a317ab to
06f31fb
Compare
|
@pytorchbot label "topic: not user facing" |
|
lgtm but I also cced two more DTensor people. If they have not reviewed it by next week, I will stamp on this PR. |
ezyang
left a comment
There was a problem hiding this comment.
I don't think this is right. The problem is if there is a view into the partial tensor; just overwriting the local tensor is not enough to cause the views to update. In fact, I tend to think you are just toast if you need a redistribute on the tensor being inplace'd. I'd look more positively on something that just errors when this happens.
Thanks for the review @ezyang! You're absolutely right about the aliasing issue. I hadn't considered the case where there might be views into the partial tensor i.e., simply overwriting |
|
@ezyang should I go ahead with this fix? |
|
sure |
@ezyang Thanks, I have updated my PR. |
|
|
||
| # update the spec for all inplace ops to handle placement changes | ||
| # that don't require redistribution | ||
| args[0]._spec = output_spec |
There was a problem hiding this comment.
I think it's still OK (and in fact preferred) for the squeeze_ logic to live inside the squeeze_ branch. Most inplace ops do NOT change the tensor meta. So because no redistribution could have occurred, the input cannot have a placement change.
There was a problem hiding this comment.
Thanks @ezyang! You're absolutely right. I'will move the args[0]._spec = output_spec inside the squeeze_.dim branch.
|
Just a small tweak please, thanks |
Done, thanks! |
|
need to update tests |
@ezyang you are referring to failing test or do I need to update existing test? |
|
your ci failed, fix it |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
4824268 to
80858d6
Compare
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Per @ezyang's feedback, inplace operations that require redistribution (e.g., Partial -> Replicate) now raise a RuntimeError instead of trying to work around the issue by overwriting _local_tensor.
|
Successfully rebased |
80858d6 to
73f9ddc
Compare
|
@ezyang I've fixed the CI failure. The issue was that the placement change check was happening before the |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…ytorch#164729) Fixes pytorch#163374. Here is the output from reproducible code: ``` W1006 09:09:26.329000 2457 /home/fedora/github/pytorch/torch/distributed/run.py:811] W1006 09:09:26.329000 2457 /home/fedora/github/pytorch/torch/distributed/run.py:811] ***************************************** W1006 09:09:26.329000 2457 /home/fedora/github/pytorch/torch/distributed/run.py:811] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W1006 09:09:26.329000 2457 /home/fedora/github/pytorch/torch/distributed/run.py:811] ***************************************** aten::clamp_(dt: f32[][R], None, 2) redistribute_input(0, [P] -> [R]) redistribute_input(t: f32[], [P] -> [R]) _c10d_functional::all_reduce(t: f32[], sum, 0) _c10d_functional::wait_tensor(t: f32[]) aten::clamp_(t: f32[], None, 2) aten::view(t: f32[], []) (Replicate(),) tensor(2., device='cuda:0') ``` The behavior is now matching what you were expecting in issue pytorch#163374: Expected behavior (from the issue): 1. Placement should change from Partial(sum) to Replicate() 2. Value should be tensor(2.) instead of tensor(144.) Actual output from this build: 1. (Replicate(),) - placement is correct 2. tensor(2., device='cuda:0') - value is correct so the inplace operation now properly redistributes the partial DTensor to replicate before performing the clamp snd maintains the correct aliasing semantics. It also produces the expected clamped value. Pull Request resolved: pytorch#164729 Approved by: https://github.com/ezyang
Fixes #163374.
Here is the output from reproducible code:
The behavior is now matching what you were expecting in issue #163374:
Expected behavior (from the issue):
Actual output from this build:
so the inplace operation now properly redistributes the partial DTensor to replicate before performing the clamp snd maintains the correct aliasing semantics. It also produces the expected clamped value.
cc: @SherlockNoMad @ezyang @janeyx99 @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci