[shard prop] default sharding validator to 1-1 OpInfo-aten entries by pianpwk · Pull Request #177595 · pytorch/pytorch

pianpwk · 2026-03-16T23:15:40Z

Stack from ghstack (oldest at bottom):

-> [shard prop] default sharding validator to 1-1 OpInfo-aten entries #177595

Taking changes from #176258

One source of sharding validator false positives/negatives has been OpInfo entries which run multiple aten ops underneath. This misleads the aten op to check, the # of inputs/output, etc.

By default we now only run if the OpInfo-aten op mapping is 1-1, and use the aten op inputs (ignore the top-level inputs).

Alternatively, run with --allow-composite to validate ALL underlying aten ops for the OpInfo entry.

Authored with Claude.

[ghstack-poisoned]

pytorch-bot · 2026-03-16T23:15:44Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/177595

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit ccd65dd with merge base 985f9ee ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / dynamo-cpython-test / test (dynamo_cpython, 1, 1, linux.c7i.2xlarge) (gh) (trunk failure)
test/dynamo/cpython/3_13/test_dict.py::DictTest::test_keys_contained

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Validate sharding rules at the aten op level using captured args/kwargs instead of OpInfo SampleInput pytrees. Adds an exhaustive mode that validates each supported aten op individually for decomposed ops, eliminating silent skips for non-1:1 aten mappings. Relaxes atol to 1e-3 to accommodate accumulated floating point error. Authored with Claude. ghstack-source-id: 2078207 Pull-Request: #177595

wconstab · 2026-03-16T23:32:34Z

+    return tensors
+
+
+def validate_aten_combination(


should we decompose the other validate fn into calls into this fn? (mainly to dedup code) - or would that not be as easy as it sounds?

saw you addressed this, thanks

… entries" Taking changes from #176258 One source of sharding validator false positives/negatives has been OpInfo entries which run multiple aten ops underneath. This misleads the aten op to check, the # of inputs/output, etc. By default we now only run if the OpInfo-aten op mapping is 1-1, and use the aten op inputs (ignore the top-level inputs). Alternatively, run with `--exhaustive` to validate ALL underlying aten ops for the OpInfo entry. Also relaxes atol to 1e-3 to loosen false negatives Authored with Claude. [ghstack-poisoned]

Validate sharding rules at the aten op level using captured args/kwargs instead of OpInfo SampleInput pytrees. Adds an exhaustive mode that validates each supported aten op individually for decomposed ops, eliminating silent skips for non-1:1 aten mappings. Relaxes atol to 1e-3 to accommodate accumulated floating point error. Authored with Claude. ghstack-source-id: d08a6bb Pull Request resolved: #177595

wconstab · 2026-03-17T19:56:57Z

    return None


+def _check_ground_truth(


bad name? doesn't actually check the ground truth, checks whether the result is usable as a ground truth!

wconstab

wondering, how useful is exhaustive mode? and how much smaller/cleaner would this diff be if instead of maintaining both modes we just made the only mode the 1:1 mode?

there are a few places that seem to duplicate codepaths between modes. like preparing mitigations, and compare_ harnesses. ideally i'd rather the 'exhaustive' mode be a simple layer on top of the 1:1 mode, nothing more.

pianpwk · 2026-03-19T19:34:28Z

wondering, how useful is exhaustive mode? and how much smaller/cleaner would this diff be if instead of maintaining both modes we just made the only mode the 1:1 mode?

hmm are you able to access all aten ops with only 1-1 mode? say entry A runs aten ops B, C, D (where C is what we care about testing) - there might not be another entry that 1-1 matches C?

wconstab · 2026-03-19T23:45:21Z

hmm. if you want to test C, is it even obvious how to figure out which op A to tell the validator to validate so that you get coverage for C?

Is it more useful to 'validate A', or would it be more useful to say 'find ops that decompose to C and gather their opinfo samples to directly validate C'? (which is more useful to you as a user i mean)

pianpwk · 2026-03-19T23:48:25Z

Is it more useful to 'validate A', or would it be more useful to say 'find ops that decompose to C and gather their opinfo samples to directly validate C'? (which is more useful to you as a user i mean)

I think by "validating A", you have a good idea that it's C you want, e.g. adaptive_avg_pool2d -> aten::adaptive_avg_pool2d, even if B & D are random clone/view/detach ops.

The gathering needs some OpInfo -> aten op mapping :\

wconstab

ok- maybe --exhaustive isn't the most informative name. --allow-composite ?

… entries" Taking changes from #176258 One source of sharding validator false positives/negatives has been OpInfo entries which run multiple aten ops underneath. This misleads the aten op to check, the # of inputs/output, etc. By default we now only run if the OpInfo-aten op mapping is 1-1, and use the aten op inputs (ignore the top-level inputs). Alternatively, run with `--exhaustive` to validate ALL underlying aten ops for the OpInfo entry. Also relaxes atol to 1e-3 to loosen false negatives Authored with Claude. [ghstack-poisoned]

Validate sharding rules at the aten op level using captured args/kwargs instead of OpInfo SampleInput pytrees. Adds an exhaustive mode that validates each supported aten op individually for decomposed ops, eliminating silent skips for non-1:1 aten mappings. Relaxes atol to 1e-3 to accommodate accumulated floating point error. Authored with Claude. ghstack-source-id: de365fb Pull Request resolved: #177595

[ghstack-poisoned]

Validate sharding rules at the aten op level using captured args/kwargs instead of OpInfo SampleInput pytrees. Adds an exhaustive mode that validates each supported aten op individually for decomposed ops, eliminating silent skips for non-1:1 aten mappings. Relaxes atol to 1e-3 to accommodate accumulated floating point error. Authored with Claude. ghstack-source-id: ff1b917 Pull Request resolved: #177595

pianpwk · 2026-03-30T19:40:26Z

@pytorchbot merge

pytorchmergebot · 2026-03-30T19:42:25Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…ytorch#177595) Taking changes from pytorch#176258 One source of sharding validator false positives/negatives has been OpInfo entries which run multiple aten ops underneath. This misleads the aten op to check, the # of inputs/output, etc. By default we now only run if the OpInfo-aten op mapping is 1-1, and use the aten op inputs (ignore the top-level inputs). Alternatively, run with `--allow-composite` to validate ALL underlying aten ops for the OpInfo entry. Authored with Claude. Pull Request resolved: pytorch#177595 Approved by: https://github.com/wconstab

…177595) Taking changes from #176258 One source of sharding validator false positives/negatives has been OpInfo entries which run multiple aten ops underneath. This misleads the aten op to check, the # of inputs/output, etc. By default we now only run if the OpInfo-aten op mapping is 1-1, and use the aten op inputs (ignore the top-level inputs). Alternatively, run with `--allow-composite` to validate ALL underlying aten ops for the OpInfo entry. Authored with Claude. Pull Request resolved: #177595 Approved by: https://github.com/wconstab

…ytorch#177595) Taking changes from pytorch#176258 One source of sharding validator false positives/negatives has been OpInfo entries which run multiple aten ops underneath. This misleads the aten op to check, the # of inputs/output, etc. By default we now only run if the OpInfo-aten op mapping is 1-1, and use the aten op inputs (ignore the top-level inputs). Alternatively, run with `--allow-composite` to validate ALL underlying aten ops for the OpInfo entry. Authored with Claude. Pull Request resolved: pytorch#177595 Approved by: https://github.com/wconstab

Update

8ed6cb1

[ghstack-poisoned]

pytorch-bot Bot added ciflow/dtensor Run DTensor specific tests ciflow/inductor ciflow/torchtitan Run TorchTitan integration tests release notes: distributed (dtensor) release notes category labels Mar 16, 2026

pianpwk changed the title ~~[dtensor] Aten-level strategy validation with exhaustive mode~~ [shard prop] default sharding validator to 1-1 OpInfo-aten entries Mar 16, 2026

pianpwk requested review from anshul-si, wconstab and zpcore March 16, 2026 23:22

wconstab reviewed Mar 16, 2026

View reviewed changes

wconstab reviewed Mar 17, 2026

View reviewed changes

wconstab approved these changes Mar 20, 2026

View reviewed changes

pianpwk added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 20, 2026

Update

ccd65dd

[ghstack-poisoned]

pytorchmergebot added the merging label Mar 30, 2026

pytorchmergebot added the Merged label Mar 30, 2026

pytorchmergebot closed this in d0371b3 Mar 30, 2026

pytorchmergebot removed the merging label Mar 30, 2026

github-actions Bot deleted the gh/pianpwk/123/head branch April 30, 2026 02:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[shard prop] default sharding validator to 1-1 OpInfo-aten entries#177595

[shard prop] default sharding validator to 1-1 OpInfo-aten entries#177595
pianpwk wants to merge 4 commits intogh/pianpwk/123/basefrom
gh/pianpwk/123/head

pianpwk commented Mar 16, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Mar 16, 2026 •

edited

Loading

Uh oh!

wconstab Mar 16, 2026

Uh oh!

wconstab Mar 17, 2026

Uh oh!

wconstab Mar 17, 2026

Uh oh!

wconstab left a comment

Uh oh!

pianpwk commented Mar 19, 2026

Uh oh!

wconstab commented Mar 19, 2026

Uh oh!

pianpwk commented Mar 19, 2026

Uh oh!

wconstab left a comment

Uh oh!

pianpwk commented Mar 30, 2026

Uh oh!

pytorchmergebot commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pianpwk commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/177595

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

wconstab Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

wconstab Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

wconstab Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

wconstab left a comment

Choose a reason for hiding this comment

Uh oh!

pianpwk commented Mar 19, 2026

Uh oh!

wconstab commented Mar 19, 2026

Uh oh!

pianpwk commented Mar 19, 2026

Uh oh!

wconstab left a comment

Choose a reason for hiding this comment

Uh oh!

pianpwk commented Mar 30, 2026

Uh oh!

pytorchmergebot commented Mar 30, 2026

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pianpwk commented Mar 16, 2026 •

edited

Loading

pytorch-bot Bot commented Mar 16, 2026 •

edited

Loading