[DTensor] fix copy_ strategy to support linearity#162460
[DTensor] fix copy_ strategy to support linearity#162460tianyu-l wants to merge 1 commit intogh/tianyu-l/5/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162460
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 93d92fa with merge base 4840a1a ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| dst_dtensor.copy_(src_dtensor) | ||
| dst_tensor.copy_(src_tensor) |
There was a problem hiding this comment.
Curious, shouldn't dst_dtensor.copy_(src_dtensor) already modified dst_tensor? I think we don't need dst_tensor.copy_(src_tensor). The same pattern also appeared in above tests though.
There was a problem hiding this comment.
Oh interesting! I tried but it didn't just copy to dst_tensor. Maybe let's land this PR and follow offline.
zpcore
left a comment
There was a problem hiding this comment.
LGTM! Left a comment for the test.
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
nice catch. I intuitively thought linearity ought to have nothing to do with copy. the silent incorrectness part bothers me- is that becuase we copied if someone tried to copy_ from a partial of one type into a partial of another type, we'd still need to do replication for correctness sake, would this PR still have the above bug in that case? |
Fixing issue introduced in pytorch#158538 where `aten.copy_.default` is registered as a pointwise op, but without linearity. In particular, when both `src` and `dst` tensors have same `Partial` placements, direct copy should happen without redistribute, instead of redistributing both to `Replicate` before making the copy. This was discovered from silent incorrect results e.g. on `torch.einsum` backward. Pull Request resolved: pytorch#162460 Approved by: https://github.com/zpcore
Fixing issue introduced in pytorch#158538 where `aten.copy_.default` is registered as a pointwise op, but without linearity. In particular, when both `src` and `dst` tensors have same `Partial` placements, direct copy should happen without redistribute, instead of redistributing both to `Replicate` before making the copy. This was discovered from silent incorrect results e.g. on `torch.einsum` backward. Pull Request resolved: pytorch#162460 Approved by: https://github.com/zpcore
Fixing issue introduced in pytorch#158538 where `aten.copy_.default` is registered as a pointwise op, but without linearity. In particular, when both `src` and `dst` tensors have same `Partial` placements, direct copy should happen without redistribute, instead of redistributing both to `Replicate` before making the copy. This was discovered from silent incorrect results e.g. on `torch.einsum` backward. Pull Request resolved: pytorch#162460 Approved by: https://github.com/zpcore
Fixing issue introduced in pytorch#158538 where `aten.copy_.default` is registered as a pointwise op, but without linearity. In particular, when both `src` and `dst` tensors have same `Partial` placements, direct copy should happen without redistribute, instead of redistributing both to `Replicate` before making the copy. This was discovered from silent incorrect results e.g. on `torch.einsum` backward. Pull Request resolved: pytorch#162460 Approved by: https://github.com/zpcore
Stack from ghstack (oldest at bottom):
Fixing issue introduced in #158538
where
aten.copy_.defaultis registered as a pointwise op, but without linearity.In particular, when both
srcanddsttensors have samePartialplacements, direct copy should happen without redistribute, instead of redistributing both toReplicatebefore making the copy.This was discovered from silent incorrect results e.g. on
torch.einsumbackward.cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @ezyang @msaroufim @dcci