add and fix OpInfo tests for the default partitioner#165372
add and fix OpInfo tests for the default partitioner#165372bdhirsh wants to merge 6 commits intogh/bdhirsh/674/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165372
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 90343a5 with merge base e787d53 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
I noticed the default partitioner was breaking in some dynamic shape tests, so prior to turning off functionalization I want to tweak it to pass all of our OpInfo tests [ghstack-poisoned]
I noticed the default partitioner was breaking in some dynamic shape tests, so prior to turning off functionalization I want to tweak it to pass all of our OpInfo tests [ghstack-poisoned]
| elif ( | ||
| "tensor_meta" not in node.meta | ||
| and node.op == "call_function" | ||
| and not isinstance(node.meta.get("val"), torch._subclasses.FakeTensor) |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
I noticed the default partitioner was breaking in some dynamic shape tests, so prior to turning off functionalization I want to tweak it to pass all of our OpInfo tests Pull Request resolved: pytorch#165372 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#165327
|
@pytorchbot revert -m "Looks like it broke slow jobs, see https://hud.pytorch.org/hud/pytorch/pytorch/331b7cc054415210ec73f4e7e4571f8a0c21ed62/1?per_page=50&name_filter=slow&mergeEphemeralLF=true" -c nosignal |
|
@pytorchbot successfully started a revert job. Check the current status here. |
|
@bdhirsh your PR has been successfully reverted. |
This reverts commit bcfea48. Reverted #165372 on behalf of https://github.com/malfet due to Looks like it broke slow jobs, see https://hud.pytorch.org/hud/pytorch/pytorch/331b7cc054415210ec73f4e7e4571f8a0c21ed62/1?per_page=50&name_filter=slow&mergeEphemeralLF=true ([comment](#165372 (comment)))
I noticed the default partitioner was breaking in some dynamic shape tests, so prior to turning off functionalization I want to tweak it to pass all of our OpInfo tests [ghstack-poisoned]
test/functorch/test_aotdispatch.py
Outdated
| decorator=toleranceOverride({torch.float32: tol(atol=1e-05, rtol=1e-05)}), | ||
| # This delta is coming entirely from the clone() on tangents | ||
| # in AOTDispatcher to make them contiguous | ||
| decorator=toleranceOverride({torch.float32: tol(atol=4e-05, rtol=1e-05)}), |
There was a problem hiding this comment.
fyi the slow-test failure that caused the revert was interesting:
(1) this test wobbled tolerance a bit:
PYTORCH_OPINFO_SAMPLE_INPUT_INDEX=32 PYTORCH_TEST_WITH_SLOW=1 PYTORCH_TEST_SKIP_FAST=1 python test/functorch/test_aotdispatch.py TestEagerFusionOpInfoCPU.test_aot_autograd_symbolic_default_partition_exhaustive_linalg_pinv_singular_cpu_float32
(2) I used @SherlockNoMad 's handy DebugMode to find the different kernels running between eager and aot_eager, and I found the difference is coming entirely from AOTDispatcher emitting clone() on tangents in the backward (presumably this causes us to call linalg.pinv backward with different striding, and the op's numerics are sensitive to strides)
We're going to have to deal with this in the "bitwise equality" workstream. I dont' want to deal with it in this PR, but we might want to consider not running the clone at all, and raising an error if we got the strides of our tangents wrong / require the user to tell us what striding they want for the tangents.
I noticed the default partitioner was breaking in some dynamic shape tests, so prior to turning off functionalization I want to tweak it to pass all of our OpInfo tests [ghstack-poisoned]
I noticed the default partitioner was breaking in some dynamic shape tests, so prior to turning off functionalization I want to tweak it to pass all of our OpInfo tests [ghstack-poisoned]
|
Starting merge as part of PR stack under #164577 |
I'm cleaning this PR up as a proper way of disabling functionalization via config in AOTDispatcher. I removed the non-functionalization related changes from the original version: (1) preventing proxy mode (and functionalization) from incorrectly decomposing CIA ops (Ed has a PR for it here: #164939) (2) preventing python-dispatcher-based decomps above autograd from running. I'm not doing this for now, will likely do it in a followup Pull Request resolved: #164577 Approved by: https://github.com/ezyang ghstack dependencies: #165372
I noticed the default partitioner was breaking in some dynamic shape tests, so prior to turning off functionalization I want to tweak it to pass all of our OpInfo tests Pull Request resolved: pytorch#165372 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#165327
…#165372)" This reverts commit bcfea48. Reverted pytorch#165372 on behalf of https://github.com/malfet due to Looks like it broke slow jobs, see https://hud.pytorch.org/hud/pytorch/pytorch/331b7cc054415210ec73f4e7e4571f8a0c21ed62/1?per_page=50&name_filter=slow&mergeEphemeralLF=true ([comment](pytorch#165372 (comment)))
I noticed the default partitioner was breaking in some dynamic shape tests, so prior to turning off functionalization I want to tweak it to pass all of our OpInfo tests Pull Request resolved: pytorch#165372 Approved by: https://github.com/ezyang
…#164577) I'm cleaning this PR up as a proper way of disabling functionalization via config in AOTDispatcher. I removed the non-functionalization related changes from the original version: (1) preventing proxy mode (and functionalization) from incorrectly decomposing CIA ops (Ed has a PR for it here: pytorch#164939) (2) preventing python-dispatcher-based decomps above autograd from running. I'm not doing this for now, will likely do it in a followup Pull Request resolved: pytorch#164577 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#165372
I noticed the default partitioner was breaking in some dynamic shape tests, so prior to turning off functionalization I want to tweak it to pass all of our OpInfo tests Pull Request resolved: pytorch#165372 Approved by: https://github.com/ezyang
…#164577) I'm cleaning this PR up as a proper way of disabling functionalization via config in AOTDispatcher. I removed the non-functionalization related changes from the original version: (1) preventing proxy mode (and functionalization) from incorrectly decomposing CIA ops (Ed has a PR for it here: pytorch#164939) (2) preventing python-dispatcher-based decomps above autograd from running. I'm not doing this for now, will likely do it in a followup Pull Request resolved: pytorch#164577 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#165372
I noticed the default partitioner was breaking in some dynamic shape tests, so prior to turning off functionalization I want to tweak it to pass all of our OpInfo tests
Stack from ghstack (oldest at bottom):