Fix inplace ops on Partial DTensors to preserve aliasing semantics by RohitRathore1 · Pull Request #164729 · pytorch/pytorch

RohitRathore1 · 2025-10-06T09:37:11Z

Here is the output from reproducible code:

W1006 09:09:26.329000 2457 /home/fedora/github/pytorch/torch/distributed/run.py:811] 
W1006 09:09:26.329000 2457 /home/fedora/github/pytorch/torch/distributed/run.py:811] *****************************************
W1006 09:09:26.329000 2457 /home/fedora/github/pytorch/torch/distributed/run.py:811] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
W1006 09:09:26.329000 2457 /home/fedora/github/pytorch/torch/distributed/run.py:811] *****************************************
  aten::clamp_(dt: f32[][R], None, 2)
    redistribute_input(0, [P] -> [R])
      redistribute_input(t: f32[], [P] -> [R])
        _c10d_functional::all_reduce(t: f32[], sum, 0)
        _c10d_functional::wait_tensor(t: f32[])
    aten::clamp_(t: f32[], None, 2)
    aten::view(t: f32[], [])
(Replicate(),)
tensor(2., device='cuda:0')

The behavior is now matching what you were expecting in issue #163374:

Expected behavior (from the issue):

Placement should change from Partial(sum) to Replicate()
Value should be tensor(2.) instead of tensor(144.)

Actual output from this build:

(Replicate(),) - placement is correct
tensor(2., device='cuda:0') - value is correct

so the inplace operation now properly redistributes the partial DTensor to replicate before performing the clamp snd maintains the correct aliasing semantics. It also produces the expected clamped value.

cc: @SherlockNoMad @ezyang @janeyx99 @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci

pytorch-bot · 2025-10-06T09:37:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164729

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit ec907a6 with merge base 573a79f ():

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

trunk / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, linux.2xlarge, unstable) (gh) (#166072)
backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_int8_static_quant_recipe

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2025-10-06T09:37:16Z

The committers listed above are authorized under a signed CLA.

✅ login: RohitRathore1 / name: Rohit Singh Rathaur (2fa6f50, 73f9ddc, 7e7712c, af70dbd, ec907a6)

RohitRathore1 · 2025-10-06T09:41:26Z

@pytorchbot label "topic: not user facing"

test/distributed/tensor/test_pointwise_ops.py

fduwjj · 2025-10-08T18:27:50Z

lgtm but I also cced two more DTensor people. If they have not reviewed it by next week, I will stamp on this PR.

test/distributed/tensor/test_pointwise_ops.py

torch/distributed/tensor/_dispatch.py

RohitRathore1 · 2025-10-14T10:01:40Z

@fduwjj @zpcore if everything looks good here then please can you approve and merge this PR as it has passed all the tests, thanks?

ezyang

I don't think this is right. The problem is if there is a view into the partial tensor; just overwriting the local tensor is not enough to cause the views to update. In fact, I tend to think you are just toast if you need a redistribute on the tensor being inplace'd. I'd look more positively on something that just errors when this happens.

RohitRathore1 · 2025-10-15T06:23:08Z

I don't think this is right. The problem is if there is a view into the partial tensor; just overwriting the local tensor is not enough to cause the views to update. In fact, I tend to think you are just toast if you need a redistribute on the tensor being inplace'd. I'd look more positively on something that just errors when this happens.

Thanks for the review @ezyang! You're absolutely right about the aliasing issue. I hadn't considered the case where there might be views into the partial tensor i.e., simply overwriting _local_tensor would break those views. So instead of trying to fix this with _local_tensor assignment, I should just error out when an inplace operation requires redistribution. If this approach sounds good, let me update the PR to raise a RuntimeError in this case and adjust the test accordingly. Should the error be raised for all inplace ops that need redistribution, or are there specific cases where it would be safe?
cc: @fduwjj @zpcore @XilunWu

RohitRathore1 · 2025-10-17T08:16:35Z

@ezyang should I go ahead with this fix?

ezyang · 2025-10-17T19:49:05Z

sure

RohitRathore1 · 2025-10-21T15:36:47Z

sure

@ezyang Thanks, I have updated my PR.

ezyang · 2025-10-27T03:03:26Z

torch/distributed/tensor/_dispatch.py

+
+                # update the spec for all inplace ops to handle placement changes
+                # that don't require redistribution
+                args[0]._spec = output_spec


I think it's still OK (and in fact preferred) for the squeeze_ logic to live inside the squeeze_ branch. Most inplace ops do NOT change the tensor meta. So because no redistribution could have occurred, the input cannot have a placement change.

Thanks @ezyang! You're absolutely right. I'will move the args[0]._spec = output_spec inside the squeeze_.dim branch.

ezyang · 2025-10-27T03:03:40Z

Just a small tweak please, thanks

RohitRathore1 · 2025-10-28T03:09:50Z

Just a small tweak please, thanks

Done, thanks!

ezyang · 2025-10-29T02:51:04Z

need to update tests

RohitRathore1 · 2025-10-31T16:41:16Z

need to update tests

@ezyang you are referring to failing test or do I need to update existing test?

ezyang · 2025-11-03T04:07:11Z

your ci failed, fix it

RohitRathore1 · 2025-11-03T06:49:15Z

@pytorchbot rebase

pytorchmergebot · 2025-11-03T06:50:49Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-11-03T06:50:52Z

Successfully rebased inplace_ops_new onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout inplace_ops_new && git pull --rebase)

RohitRathore1 · 2025-11-12T02:25:16Z

@pytorchbot rebase

pytorchmergebot · 2025-11-12T02:26:50Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@ezyang

Per @ezyang's feedback, inplace operations that require redistribution (e.g., Partial -> Replicate) now raise a RuntimeError instead of trying to work around the issue by overwriting _local_tensor.

pytorchmergebot · 2025-11-12T02:26:54Z

Successfully rebased inplace_ops_new onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout inplace_ops_new && git pull --rebase)

RohitRathore1 · 2025-11-12T08:56:55Z

@ezyang I've fixed the CI failure. The issue was that the placement change check was happening before the squeeze_.dim special case, so I moved it into the else block to exempt squeeze_.dim. I also removed the spec update for other inplace ops as you mentioned since most inplace ops don't change tensor meta and no redistribution occurs when placements match, the spec update isn't needed. Let me know if there's anything else that needs to be addressed.

ezyang · 2025-11-14T05:20:25Z

@pytorchbot merge

pytorchmergebot · 2025-11-14T05:24:24Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…ytorch#164729) Fixes pytorch#163374. Here is the output from reproducible code: ``` W1006 09:09:26.329000 2457 /home/fedora/github/pytorch/torch/distributed/run.py:811] W1006 09:09:26.329000 2457 /home/fedora/github/pytorch/torch/distributed/run.py:811] ***************************************** W1006 09:09:26.329000 2457 /home/fedora/github/pytorch/torch/distributed/run.py:811] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W1006 09:09:26.329000 2457 /home/fedora/github/pytorch/torch/distributed/run.py:811] ***************************************** aten::clamp_(dt: f32[][R], None, 2) redistribute_input(0, [P] -> [R]) redistribute_input(t: f32[], [P] -> [R]) _c10d_functional::all_reduce(t: f32[], sum, 0) _c10d_functional::wait_tensor(t: f32[]) aten::clamp_(t: f32[], None, 2) aten::view(t: f32[], []) (Replicate(),) tensor(2., device='cuda:0') ``` The behavior is now matching what you were expecting in issue pytorch#163374: Expected behavior (from the issue): 1. Placement should change from Partial(sum) to Replicate() 2. Value should be tensor(2.) instead of tensor(144.) Actual output from this build: 1. (Replicate(),) - placement is correct 2. tensor(2., device='cuda:0') - value is correct so the inplace operation now properly redistributes the partial DTensor to replicate before performing the clamp snd maintains the correct aliasing semantics. It also produces the expected clamped value. Pull Request resolved: pytorch#164729 Approved by: https://github.com/ezyang

pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Oct 6, 2025

RohitRathore1 force-pushed the inplace_ops_new branch from 2a317ab to 06f31fb Compare October 6, 2025 09:39

pytorch-bot bot added the topic: not user facing topic category label Oct 6, 2025

pytorchbot added the open source label Oct 6, 2025

janeyx99 requested a review from tianyu-l October 7, 2025 21:56

janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 7, 2025

fduwjj reviewed Oct 8, 2025

View reviewed changes

test/distributed/tensor/test_pointwise_ops.py Show resolved Hide resolved

fduwjj requested review from XilunWu and zpcore October 8, 2025 18:27

zpcore reviewed Oct 8, 2025

View reviewed changes

test/distributed/tensor/test_pointwise_ops.py Outdated Show resolved Hide resolved

zpcore reviewed Oct 9, 2025

View reviewed changes

torch/distributed/tensor/_dispatch.py Outdated Show resolved Hide resolved

ezyang requested changes Oct 15, 2025

View reviewed changes

ezyang reviewed Oct 27, 2025

View reviewed changes

pytorchmergebot force-pushed the inplace_ops_new branch from 4824268 to 80858d6 Compare November 3, 2025 06:50

RohitRathore1 added 4 commits November 12, 2025 02:26

Fix inplace ops on Partial DTensors to preserve aliasing semantics

7e7712c

Address feedback

2fa6f50

Error on inplace ops requiring redistribution to preserve aliasing

af70dbd

Per @ezyang's feedback, inplace operations that require redistribution (e.g., Partial -> Replicate) now raise a RuntimeError instead of trying to work around the issue by overwriting _local_tensor.

Address suggestions

73f9ddc

pytorchmergebot force-pushed the inplace_ops_new branch from 80858d6 to 73f9ddc Compare November 12, 2025 02:26

Fix CI: exempt squeeze_.dim from placement change check

ec907a6

ezyang approved these changes Nov 14, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 14, 2025

pytorchmergebot added the merging label Nov 14, 2025

pytorchmergebot added the Merged label Nov 14, 2025

pytorchmergebot closed this in f8a2ce3 Nov 14, 2025

pytorchmergebot removed the merging label Nov 14, 2025

Conversation

RohitRathore1 commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164729

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

linux-foundation-easycla bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RohitRathore1 commented Oct 6, 2025

Uh oh!

Uh oh!

fduwjj commented Oct 8, 2025

Uh oh!

Uh oh!

Uh oh!

RohitRathore1 commented Oct 14, 2025

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

RohitRathore1 commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RohitRathore1 commented Oct 17, 2025

Uh oh!

ezyang commented Oct 17, 2025

Uh oh!

RohitRathore1 commented Oct 21, 2025

Uh oh!

ezyang Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

RohitRathore1 Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

ezyang commented Oct 27, 2025

Uh oh!

RohitRathore1 commented Oct 28, 2025

Uh oh!

ezyang commented Oct 29, 2025

Uh oh!

RohitRathore1 commented Oct 31, 2025

Uh oh!

ezyang commented Nov 3, 2025

Uh oh!

RohitRathore1 commented Nov 3, 2025

Uh oh!

pytorchmergebot commented Nov 3, 2025

Uh oh!

pytorchmergebot commented Nov 3, 2025

Uh oh!

RohitRathore1 commented Nov 12, 2025

Uh oh!

pytorchmergebot commented Nov 12, 2025

Uh oh!

pytorchmergebot commented Nov 12, 2025

Uh oh!

RohitRathore1 commented Nov 12, 2025

Uh oh!

ezyang commented Nov 14, 2025

Uh oh!

pytorchmergebot commented Nov 14, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

RohitRathore1 commented Oct 6, 2025 •

edited

Loading

pytorch-bot bot commented Oct 6, 2025 •

edited

Loading

linux-foundation-easycla bot commented Oct 6, 2025 •

edited

Loading

RohitRathore1 commented Oct 15, 2025 •

edited

Loading