[DTensor] fix copy_ strategy to support linearity by tianyu-l · Pull Request #162460 · pytorch/pytorch

tianyu-l · 2025-09-09T06:11:59Z

Stack from ghstack (oldest at bottom):

-> [DTensor] fix copy_ strategy to support linearity #162460

Fixing issue introduced in #158538
where aten.copy_.default is registered as a pointwise op, but without linearity.

In particular, when both src and dst tensors have same Partial placements, direct copy should happen without redistribute, instead of redistributing both to Replicate before making the copy.

This was discovered from silent incorrect results e.g. on torch.einsum backward.

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @ezyang @msaroufim @dcci

[ghstack-poisoned]

pytorch-bot · 2025-09-09T06:12:02Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162460

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 93d92fa with merge base 4840a1a ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 2c80ec6 Pull Request resolved: #162460

zpcore · 2025-09-09T16:33:53Z

+            dst_dtensor.copy_(src_dtensor)
+            dst_tensor.copy_(src_tensor)


Curious, shouldn't dst_dtensor.copy_(src_dtensor) already modified dst_tensor? I think we don't need dst_tensor.copy_(src_tensor). The same pattern also appeared in above tests though.

Oh interesting! I tried but it didn't just copy to dst_tensor. Maybe let's land this PR and follow offline.

zpcore

LGTM! Left a comment for the test.

tianyu-l · 2025-09-10T00:39:48Z

@pytorchbot merge

pytorchmergebot · 2025-09-10T00:41:32Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

wconstab · 2025-09-10T04:11:57Z

nice catch. I intuitively thought linearity ought to have nothing to do with copy.

the silent incorrectness part bothers me- is that becuase we copied src into a replicated copy of dst and then threw that away without modifying the original dst?

if someone tried to copy_ from a partial of one type into a partial of another type, we'd still need to do replication for correctness sake, would this PR still have the above bug in that case?

Fixing issue introduced in pytorch#158538 where `aten.copy_.default` is registered as a pointwise op, but without linearity. In particular, when both `src` and `dst` tensors have same `Partial` placements, direct copy should happen without redistribute, instead of redistributing both to `Replicate` before making the copy. This was discovered from silent incorrect results e.g. on `torch.einsum` backward. Pull Request resolved: pytorch#162460 Approved by: https://github.com/zpcore

Update

93d92fa

[ghstack-poisoned]

pytorch-bot Bot added ciflow/inductor oncall: distributed Add this issue/PR to distributed oncall triage queue labels Sep 9, 2025

tianyu-l added a commit that referenced this pull request Sep 9, 2025

[DTensor] fix copy_ strategy to support linearity

7c6ad14

ghstack-source-id: 2c80ec6 Pull Request resolved: #162460

tianyu-l requested review from ezyang and wanchaol September 9, 2025 06:16

tianyu-l added ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category release notes: distributed (dtensor) release notes category labels Sep 9, 2025

tianyu-l requested review from XilunWu and zpcore September 9, 2025 06:18

zpcore reviewed Sep 9, 2025

View reviewed changes

zpcore approved these changes Sep 9, 2025

View reviewed changes

pytorchmergebot added the merging label Sep 10, 2025

pytorchmergebot added the Merged label Sep 10, 2025

pytorchmergebot closed this in e60ad4f Sep 10, 2025

pytorchmergebot removed the merging label Sep 10, 2025

tianyu-l deleted the gh/tianyu-l/5/head branch September 10, 2025 04:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DTensor] fix copy_ strategy to support linearity#162460

[DTensor] fix copy_ strategy to support linearity#162460
tianyu-l wants to merge 1 commit intogh/tianyu-l/5/basefrom
gh/tianyu-l/5/head

tianyu-l commented Sep 9, 2025 •

edited by pytorch-bot Bot

Loading

Uh oh!

pytorch-bot Bot commented Sep 9, 2025 •

edited

Loading

Uh oh!

zpcore Sep 9, 2025

Uh oh!

tianyu-l Sep 10, 2025

Uh oh!

zpcore left a comment

Uh oh!

tianyu-l commented Sep 10, 2025

Uh oh!

pytorchmergebot commented Sep 10, 2025

Uh oh!

wconstab commented Sep 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tianyu-l commented Sep 9, 2025 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162460

✅ No Failures

Uh oh!

zpcore Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

tianyu-l Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

zpcore left a comment

Choose a reason for hiding this comment

Uh oh!

tianyu-l commented Sep 10, 2025

Uh oh!

pytorchmergebot commented Sep 10, 2025

Merge started

Uh oh!

wconstab commented Sep 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tianyu-l commented Sep 9, 2025 •

edited by pytorch-bot Bot

Loading

pytorch-bot Bot commented Sep 9, 2025 •

edited

Loading