[shard prop] default sharding validator to 1-1 OpInfo-aten entries#177595
[shard prop] default sharding validator to 1-1 OpInfo-aten entries#177595pianpwk wants to merge 4 commits intogh/pianpwk/123/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/177595
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit ccd65dd with merge base 985f9ee ( BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Validate sharding rules at the aten op level using captured args/kwargs instead of OpInfo SampleInput pytrees. Adds an exhaustive mode that validates each supported aten op individually for decomposed ops, eliminating silent skips for non-1:1 aten mappings. Relaxes atol to 1e-3 to accommodate accumulated floating point error. Authored with Claude. ghstack-source-id: 2078207 Pull-Request: #177595
| return tensors | ||
|
|
||
|
|
||
| def validate_aten_combination( |
There was a problem hiding this comment.
should we decompose the other validate fn into calls into this fn? (mainly to dedup code) - or would that not be as easy as it sounds?
There was a problem hiding this comment.
saw you addressed this, thanks
… entries" Taking changes from #176258 One source of sharding validator false positives/negatives has been OpInfo entries which run multiple aten ops underneath. This misleads the aten op to check, the # of inputs/output, etc. By default we now only run if the OpInfo-aten op mapping is 1-1, and use the aten op inputs (ignore the top-level inputs). Alternatively, run with `--exhaustive` to validate ALL underlying aten ops for the OpInfo entry. Also relaxes atol to 1e-3 to loosen false negatives Authored with Claude. [ghstack-poisoned]
Validate sharding rules at the aten op level using captured args/kwargs instead of OpInfo SampleInput pytrees. Adds an exhaustive mode that validates each supported aten op individually for decomposed ops, eliminating silent skips for non-1:1 aten mappings. Relaxes atol to 1e-3 to accommodate accumulated floating point error. Authored with Claude. ghstack-source-id: d08a6bb Pull Request resolved: #177595
| return None | ||
|
|
||
|
|
||
| def _check_ground_truth( |
There was a problem hiding this comment.
bad name? doesn't actually check the ground truth, checks whether the result is usable as a ground truth!
wconstab
left a comment
There was a problem hiding this comment.
wondering, how useful is exhaustive mode? and how much smaller/cleaner would this diff be if instead of maintaining both modes we just made the only mode the 1:1 mode?
there are a few places that seem to duplicate codepaths between modes. like preparing mitigations, and compare_ harnesses. ideally i'd rather the 'exhaustive' mode be a simple layer on top of the 1:1 mode, nothing more.
hmm are you able to access all aten ops with only 1-1 mode? say entry A runs aten ops B, C, D (where C is what we care about testing) - there might not be another entry that 1-1 matches C? |
|
hmm. if you want to test C, is it even obvious how to figure out which op A to tell the validator to validate so that you get coverage for C? Is it more useful to 'validate A', or would it be more useful to say 'find ops that decompose to C and gather their opinfo samples to directly validate C'? (which is more useful to you as a user i mean) |
I think by "validating A", you have a good idea that it's C you want, e.g. adaptive_avg_pool2d -> aten::adaptive_avg_pool2d, even if B & D are random clone/view/detach ops. The gathering needs some OpInfo -> aten op mapping :\ |
wconstab
left a comment
There was a problem hiding this comment.
ok- maybe --exhaustive isn't the most informative name. --allow-composite ?
… entries" Taking changes from #176258 One source of sharding validator false positives/negatives has been OpInfo entries which run multiple aten ops underneath. This misleads the aten op to check, the # of inputs/output, etc. By default we now only run if the OpInfo-aten op mapping is 1-1, and use the aten op inputs (ignore the top-level inputs). Alternatively, run with `--exhaustive` to validate ALL underlying aten ops for the OpInfo entry. Also relaxes atol to 1e-3 to loosen false negatives Authored with Claude. [ghstack-poisoned]
Validate sharding rules at the aten op level using captured args/kwargs instead of OpInfo SampleInput pytrees. Adds an exhaustive mode that validates each supported aten op individually for decomposed ops, eliminating silent skips for non-1:1 aten mappings. Relaxes atol to 1e-3 to accommodate accumulated floating point error. Authored with Claude. ghstack-source-id: de365fb Pull Request resolved: #177595
Validate sharding rules at the aten op level using captured args/kwargs instead of OpInfo SampleInput pytrees. Adds an exhaustive mode that validates each supported aten op individually for decomposed ops, eliminating silent skips for non-1:1 aten mappings. Relaxes atol to 1e-3 to accommodate accumulated floating point error. Authored with Claude. ghstack-source-id: ff1b917 Pull Request resolved: #177595
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…ytorch#177595) Taking changes from pytorch#176258 One source of sharding validator false positives/negatives has been OpInfo entries which run multiple aten ops underneath. This misleads the aten op to check, the # of inputs/output, etc. By default we now only run if the OpInfo-aten op mapping is 1-1, and use the aten op inputs (ignore the top-level inputs). Alternatively, run with `--allow-composite` to validate ALL underlying aten ops for the OpInfo entry. Authored with Claude. Pull Request resolved: pytorch#177595 Approved by: https://github.com/wconstab
…177595) Taking changes from #176258 One source of sharding validator false positives/negatives has been OpInfo entries which run multiple aten ops underneath. This misleads the aten op to check, the # of inputs/output, etc. By default we now only run if the OpInfo-aten op mapping is 1-1, and use the aten op inputs (ignore the top-level inputs). Alternatively, run with `--allow-composite` to validate ALL underlying aten ops for the OpInfo entry. Authored with Claude. Pull Request resolved: #177595 Approved by: https://github.com/wconstab
…ytorch#177595) Taking changes from pytorch#176258 One source of sharding validator false positives/negatives has been OpInfo entries which run multiple aten ops underneath. This misleads the aten op to check, the # of inputs/output, etc. By default we now only run if the OpInfo-aten op mapping is 1-1, and use the aten op inputs (ignore the top-level inputs). Alternatively, run with `--allow-composite` to validate ALL underlying aten ops for the OpInfo entry. Authored with Claude. Pull Request resolved: pytorch#177595 Approved by: https://github.com/wconstab
…ytorch#177595) Taking changes from pytorch#176258 One source of sharding validator false positives/negatives has been OpInfo entries which run multiple aten ops underneath. This misleads the aten op to check, the # of inputs/output, etc. By default we now only run if the OpInfo-aten op mapping is 1-1, and use the aten op inputs (ignore the top-level inputs). Alternatively, run with `--allow-composite` to validate ALL underlying aten ops for the OpInfo entry. Authored with Claude. Pull Request resolved: pytorch#177595 Approved by: https://github.com/wconstab
…ytorch#177595) Taking changes from pytorch#176258 One source of sharding validator false positives/negatives has been OpInfo entries which run multiple aten ops underneath. This misleads the aten op to check, the # of inputs/output, etc. By default we now only run if the OpInfo-aten op mapping is 1-1, and use the aten op inputs (ignore the top-level inputs). Alternatively, run with `--allow-composite` to validate ALL underlying aten ops for the OpInfo entry. Authored with Claude. Pull Request resolved: pytorch#177595 Approved by: https://github.com/wconstab
…ytorch#177595) Taking changes from pytorch#176258 One source of sharding validator false positives/negatives has been OpInfo entries which run multiple aten ops underneath. This misleads the aten op to check, the # of inputs/output, etc. By default we now only run if the OpInfo-aten op mapping is 1-1, and use the aten op inputs (ignore the top-level inputs). Alternatively, run with `--allow-composite` to validate ALL underlying aten ops for the OpInfo entry. Authored with Claude. Pull Request resolved: pytorch#177595 Approved by: https://github.com/wconstab
Stack from ghstack (oldest at bottom):
Taking changes from #176258
One source of sharding validator false positives/negatives has been OpInfo entries which run multiple aten ops underneath. This misleads the aten op to check, the # of inputs/output, etc.
By default we now only run if the OpInfo-aten op mapping is 1-1, and use the aten op inputs (ignore the top-level inputs).
Alternatively, run with
--allow-compositeto validate ALL underlying aten ops for the OpInfo entry.Authored with Claude.