Skip to content

[DTensor] Strategy Validation (4/4): Multi-output ops#174995

Closed
wconstab wants to merge 5 commits intogh/wconstab/533/basefrom
gh/wconstab/533/head
Closed

[DTensor] Strategy Validation (4/4): Multi-output ops#174995
wconstab wants to merge 5 commits intogh/wconstab/533/basefrom
gh/wconstab/533/head

Conversation

@wconstab
Copy link
Copy Markdown
Contributor

@wconstab wconstab commented Feb 13, 2026

Stack from ghstack (oldest at bottom):

Support multi-output ops like split, unbind, topk, sort.

Tested for these ops and things look reasonable (not an exhaustive test
of all multi-output ops):

  • unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos).
  • topk: 14 true positives, 0 false positives
  • sort: 102 true positives, 0 false positives
  • split_with_sizes: 24 true positives, 0 false positives
  • chunk: 18 true positives, 0 false positives

No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and
multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks).

Support multi-output ops like split, unbind, topk, sort.

Tested for these ops and things look reasonable (not an exhaustive test
of all multi-output ops):

  - unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos).
  - topk: 14 true positives, 0 false positives
  - sort: 102 true positives, 0 false positives
  - split_with_sizes: 24 true positives, 0 false positives
  - chunk: 18 true positives, 0 false positives

  No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and
   multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks).

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Feb 13, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/174995

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 6e797cf with merge base 003e05b (image):

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

wconstab added a commit that referenced this pull request Feb 13, 2026
Support multi-output ops like split, unbind, topk, sort.

Tested for these ops and things look reasonable (not an exhaustive test
of all multi-output ops):

  - unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos).
  - topk: 14 true positives, 0 false positives
  - sort: 102 true positives, 0 false positives
  - split_with_sizes: 24 true positives, 0 false positives
  - chunk: 18 true positives, 0 false positives

  No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and
   multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks).

ghstack-source-id: 1b6f484
Pull Request resolved: #174995
Support multi-output ops like split, unbind, topk, sort.

Tested for these ops and things look reasonable (not an exhaustive test
of all multi-output ops):

  - unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos).
  - topk: 14 true positives, 0 false positives
  - sort: 102 true positives, 0 false positives
  - split_with_sizes: 24 true positives, 0 false positives
  - chunk: 18 true positives, 0 false positives

  No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and
   multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks).

[ghstack-poisoned]
wconstab added a commit that referenced this pull request Feb 13, 2026
Support multi-output ops like split, unbind, topk, sort.

Tested for these ops and things look reasonable (not an exhaustive test
of all multi-output ops):

  - unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos).
  - topk: 14 true positives, 0 false positives
  - sort: 102 true positives, 0 false positives
  - split_with_sizes: 24 true positives, 0 false positives
  - chunk: 18 true positives, 0 false positives

  No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and
   multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks).

ghstack-source-id: 942080a
Pull Request resolved: #174995
Support multi-output ops like split, unbind, topk, sort.

Tested for these ops and things look reasonable (not an exhaustive test
of all multi-output ops):

  - unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos).
  - topk: 14 true positives, 0 false positives
  - sort: 102 true positives, 0 false positives
  - split_with_sizes: 24 true positives, 0 false positives
  - chunk: 18 true positives, 0 false positives

  No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and
   multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks).

[ghstack-poisoned]
wconstab added a commit that referenced this pull request Feb 14, 2026
Support multi-output ops like split, unbind, topk, sort.

Tested for these ops and things look reasonable (not an exhaustive test
of all multi-output ops):

  - unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos).
  - topk: 14 true positives, 0 false positives
  - sort: 102 true positives, 0 false positives
  - split_with_sizes: 24 true positives, 0 false positives
  - chunk: 18 true positives, 0 false positives

  No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and
   multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks).

ghstack-source-id: abe3182
Pull Request resolved: #174995
Support multi-output ops like split, unbind, topk, sort.

Tested for these ops and things look reasonable (not an exhaustive test
of all multi-output ops):

  - unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos).
  - topk: 14 true positives, 0 false positives
  - sort: 102 true positives, 0 false positives
  - split_with_sizes: 24 true positives, 0 false positives
  - chunk: 18 true positives, 0 false positives

  No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and
   multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks).

[ghstack-poisoned]
wconstab added a commit that referenced this pull request Feb 17, 2026
Support multi-output ops like split, unbind, topk, sort.

Tested for these ops and things look reasonable (not an exhaustive test
of all multi-output ops):

  - unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos).
  - topk: 14 true positives, 0 false positives
  - sort: 102 true positives, 0 false positives
  - split_with_sizes: 24 true positives, 0 false positives
  - chunk: 18 true positives, 0 false positives

  No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and
   multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks).

ghstack-source-id: 63dd244
Pull Request resolved: #174995
@wconstab wconstab requested review from pianpwk and zpcore February 17, 2026 17:57
Comment on lines +534 to +544
if isinstance(combination.output_placement, Replicate):
local_values = [local_out._local_tensors[r] for r in range(world_size)]
all_same = all(
torch.allclose(local_values[0], lv, atol=1e-5, rtol=1e-5)
for lv in local_values[1:]
)
if not all_same:
return (
False,
f"Replicate output[{i}] but local values differ across ranks",
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this part? I think we can use the same logic below regardless of the combinations.output_placement.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like an additional check that forces that local values are the same across ranks before redistribute, rather than just checking that after redistribute to full_tensor the rank0 value is correct. I guess it is possible this is a useful check, but i don't see how it is specifically related to the multi-output support. i'm asking claude to explain it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, it was an existing check, just moved inside the for loop over ground truths. i think this is OK

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we remove this part, everything should still work.

output_plc = spec.output_spec.placements[0]
if isinstance(spec.output_specs, tuple):
first_output_spec = spec.output_specs[0]
if first_output_spec is None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case the first_output_spec is None? I thought you already assume if the output is tuple, it should be tuple[Tensor].

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, i am gonna harden _is_tensor_output so it asserts if there is a mixture of tensor and non-tensor types in a tuple. this will make this check unnecessary for now, but later if we want to support such an op (e.g. SDPA) that may return none, then we need to change this logic.

Support multi-output ops like split, unbind, topk, sort.

Tested for these ops and things look reasonable (not an exhaustive test
of all multi-output ops):

  - unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos).
  - topk: 14 true positives, 0 false positives
  - sort: 102 true positives, 0 false positives
  - split_with_sizes: 24 true positives, 0 false positives
  - chunk: 18 true positives, 0 false positives

  No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and
   multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks).

[ghstack-poisoned]
wconstab added a commit that referenced this pull request Feb 17, 2026
Support multi-output ops like split, unbind, topk, sort.

Tested for these ops and things look reasonable (not an exhaustive test
of all multi-output ops):

  - unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos).
  - topk: 14 true positives, 0 false positives
  - sort: 102 true positives, 0 false positives
  - split_with_sizes: 24 true positives, 0 false positives
  - chunk: 18 true positives, 0 false positives

  No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and
   multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks).

ghstack-source-id: 6b77770
Pull Request resolved: #174995
@wconstab
Copy link
Copy Markdown
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 18, 2026
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

norx1991 pushed a commit that referenced this pull request Feb 24, 2026
Support multi-output ops like split, unbind, topk, sort.

Tested for these ops and things look reasonable (not an exhaustive test
of all multi-output ops):

  - unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos).
  - topk: 14 true positives, 0 false positives
  - sort: 102 true positives, 0 false positives
  - split_with_sizes: 24 true positives, 0 false positives
  - chunk: 18 true positives, 0 false positives

  No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and
   multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks).
Pull Request resolved: #174995
Approved by: https://github.com/pianpwk, https://github.com/zpcore
ghstack dependencies: #174799, #174800
@github-actions github-actions Bot deleted the gh/wconstab/533/head branch March 20, 2026 02:23
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
Support multi-output ops like split, unbind, topk, sort.

Tested for these ops and things look reasonable (not an exhaustive test
of all multi-output ops):

  - unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos).
  - topk: 14 true positives, 0 false positives
  - sort: 102 true positives, 0 false positives
  - split_with_sizes: 24 true positives, 0 false positives
  - chunk: 18 true positives, 0 false positives

  No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and
   multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks).
Pull Request resolved: pytorch#174995
Approved by: https://github.com/pianpwk, https://github.com/zpcore
ghstack dependencies: pytorch#174799, pytorch#174800
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (dtensor) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants