[DTensor] Strategy Validation (4/4): Multi-output ops#174995
[DTensor] Strategy Validation (4/4): Multi-output ops#174995wconstab wants to merge 5 commits intogh/wconstab/533/basefrom
Conversation
Support multi-output ops like split, unbind, topk, sort. Tested for these ops and things look reasonable (not an exhaustive test of all multi-output ops): - unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos). - topk: 14 true positives, 0 false positives - sort: 102 true positives, 0 false positives - split_with_sizes: 24 true positives, 0 false positives - chunk: 18 true positives, 0 false positives No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks). [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/174995
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit 6e797cf with merge base 003e05b ( UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Support multi-output ops like split, unbind, topk, sort. Tested for these ops and things look reasonable (not an exhaustive test of all multi-output ops): - unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos). - topk: 14 true positives, 0 false positives - sort: 102 true positives, 0 false positives - split_with_sizes: 24 true positives, 0 false positives - chunk: 18 true positives, 0 false positives No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks). ghstack-source-id: 1b6f484 Pull Request resolved: #174995
Support multi-output ops like split, unbind, topk, sort. Tested for these ops and things look reasonable (not an exhaustive test of all multi-output ops): - unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos). - topk: 14 true positives, 0 false positives - sort: 102 true positives, 0 false positives - split_with_sizes: 24 true positives, 0 false positives - chunk: 18 true positives, 0 false positives No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks). [ghstack-poisoned]
Support multi-output ops like split, unbind, topk, sort. Tested for these ops and things look reasonable (not an exhaustive test of all multi-output ops): - unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos). - topk: 14 true positives, 0 false positives - sort: 102 true positives, 0 false positives - split_with_sizes: 24 true positives, 0 false positives - chunk: 18 true positives, 0 false positives No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks). ghstack-source-id: 942080a Pull Request resolved: #174995
Support multi-output ops like split, unbind, topk, sort. Tested for these ops and things look reasonable (not an exhaustive test of all multi-output ops): - unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos). - topk: 14 true positives, 0 false positives - sort: 102 true positives, 0 false positives - split_with_sizes: 24 true positives, 0 false positives - chunk: 18 true positives, 0 false positives No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks). [ghstack-poisoned]
Support multi-output ops like split, unbind, topk, sort. Tested for these ops and things look reasonable (not an exhaustive test of all multi-output ops): - unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos). - topk: 14 true positives, 0 false positives - sort: 102 true positives, 0 false positives - split_with_sizes: 24 true positives, 0 false positives - chunk: 18 true positives, 0 false positives No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks). ghstack-source-id: abe3182 Pull Request resolved: #174995
Support multi-output ops like split, unbind, topk, sort. Tested for these ops and things look reasonable (not an exhaustive test of all multi-output ops): - unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos). - topk: 14 true positives, 0 false positives - sort: 102 true positives, 0 false positives - split_with_sizes: 24 true positives, 0 false positives - chunk: 18 true positives, 0 false positives No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks). [ghstack-poisoned]
Support multi-output ops like split, unbind, topk, sort. Tested for these ops and things look reasonable (not an exhaustive test of all multi-output ops): - unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos). - topk: 14 true positives, 0 false positives - sort: 102 true positives, 0 false positives - split_with_sizes: 24 true positives, 0 false positives - chunk: 18 true positives, 0 false positives No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks). ghstack-source-id: 63dd244 Pull Request resolved: #174995
| if isinstance(combination.output_placement, Replicate): | ||
| local_values = [local_out._local_tensors[r] for r in range(world_size)] | ||
| all_same = all( | ||
| torch.allclose(local_values[0], lv, atol=1e-5, rtol=1e-5) | ||
| for lv in local_values[1:] | ||
| ) | ||
| if not all_same: | ||
| return ( | ||
| False, | ||
| f"Replicate output[{i}] but local values differ across ranks", | ||
| ) |
There was a problem hiding this comment.
Do we need this part? I think we can use the same logic below regardless of the combinations.output_placement.
There was a problem hiding this comment.
this looks like an additional check that forces that local values are the same across ranks before redistribute, rather than just checking that after redistribute to full_tensor the rank0 value is correct. I guess it is possible this is a useful check, but i don't see how it is specifically related to the multi-output support. i'm asking claude to explain it
There was a problem hiding this comment.
oh, it was an existing check, just moved inside the for loop over ground truths. i think this is OK
There was a problem hiding this comment.
I think if we remove this part, everything should still work.
| output_plc = spec.output_spec.placements[0] | ||
| if isinstance(spec.output_specs, tuple): | ||
| first_output_spec = spec.output_specs[0] | ||
| if first_output_spec is None: |
There was a problem hiding this comment.
In which case the first_output_spec is None? I thought you already assume if the output is tuple, it should be tuple[Tensor].
There was a problem hiding this comment.
yea, i am gonna harden _is_tensor_output so it asserts if there is a mixture of tensor and non-tensor types in a tuple. this will make this check unnecessary for now, but later if we want to support such an op (e.g. SDPA) that may return none, then we need to change this logic.
Support multi-output ops like split, unbind, topk, sort. Tested for these ops and things look reasonable (not an exhaustive test of all multi-output ops): - unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos). - topk: 14 true positives, 0 false positives - sort: 102 true positives, 0 false positives - split_with_sizes: 24 true positives, 0 false positives - chunk: 18 true positives, 0 false positives No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks). [ghstack-poisoned]
Support multi-output ops like split, unbind, topk, sort. Tested for these ops and things look reasonable (not an exhaustive test of all multi-output ops): - unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos). - topk: 14 true positives, 0 false positives - sort: 102 true positives, 0 false positives - split_with_sizes: 24 true positives, 0 false positives - chunk: 18 true positives, 0 false positives No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks). ghstack-source-id: 6b77770 Pull Request resolved: #174995
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Support multi-output ops like split, unbind, topk, sort. Tested for these ops and things look reasonable (not an exhaustive test of all multi-output ops): - unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos). - topk: 14 true positives, 0 false positives - sort: 102 true positives, 0 false positives - split_with_sizes: 24 true positives, 0 false positives - chunk: 18 true positives, 0 false positives No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks). Pull Request resolved: #174995 Approved by: https://github.com/pianpwk, https://github.com/zpcore ghstack dependencies: #174799, #174800
Support multi-output ops like split, unbind, topk, sort. Tested for these ops and things look reasonable (not an exhaustive test of all multi-output ops): - unbind: 0 true positives because its strategy unshards the unbind dimension, so all non-trivial rules involve Replicate inputs → skipped. This is correct behavior (the validator only tests non-fully-replicated combos). - topk: 14 true positives, 0 false positives - sort: 102 true positives, 0 false positives - split_with_sizes: 24 true positives, 0 false positives - chunk: 18 true positives, 0 false positives No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks). Pull Request resolved: pytorch#174995 Approved by: https://github.com/pianpwk, https://github.com/zpcore ghstack dependencies: pytorch#174799, pytorch#174800
Stack from ghstack (oldest at bottom):
Support multi-output ops like split, unbind, topk, sort.
Tested for these ops and things look reasonable (not an exhaustive test
of all multi-output ops):
No unexpected issues with any of the multi-output operators. The implementation handles all of them correctly — single-output and
multi-output ops with varying tuple sizes (unbind's dynamic N outputs, topk/sort's 2-element tuples, split's variable chunks).