Skip to content

add and fix OpInfo tests for the default partitioner#165372

Closed
bdhirsh wants to merge 6 commits intogh/bdhirsh/674/basefrom
gh/bdhirsh/674/head
Closed

add and fix OpInfo tests for the default partitioner#165372
bdhirsh wants to merge 6 commits intogh/bdhirsh/674/basefrom
gh/bdhirsh/674/head

Conversation

@bdhirsh
Copy link
Collaborator

@bdhirsh bdhirsh commented Oct 13, 2025

I noticed the default partitioner was breaking in some dynamic shape tests, so prior to turning off functionalization I want to tweak it to pass all of our OpInfo tests

Stack from ghstack (oldest at bottom):

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 13, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165372

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 90343a5 with merge base e787d53 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

I noticed the default partitioner was breaking in some dynamic shape tests, so prior to turning off functionalization I want to tweak it to pass all of our OpInfo tests




[ghstack-poisoned]
I noticed the default partitioner was breaking in some dynamic shape tests, so prior to turning off functionalization I want to tweak it to pass all of our OpInfo tests




[ghstack-poisoned]
@albanD albanD removed their request for review October 14, 2025 14:56
elif (
"tensor_meta" not in node.meta
and node.op == "call_function"
and not isinstance(node.meta.get("val"), torch._subclasses.FakeTensor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just test Tensor?

@bdhirsh
Copy link
Collaborator Author

bdhirsh commented Oct 14, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 14, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

zhudada0120 pushed a commit to zhudada0120/pytorch that referenced this pull request Oct 15, 2025
I noticed the default partitioner was breaking in some dynamic shape tests, so prior to turning off functionalization I want to tweak it to pass all of our OpInfo tests

Pull Request resolved: pytorch#165372
Approved by: https://github.com/ezyang
ghstack dependencies: pytorch#165327
@malfet
Copy link
Contributor

malfet commented Oct 15, 2025

@pytorchbot revert -m "Looks like it broke slow jobs, see https://hud.pytorch.org/hud/pytorch/pytorch/331b7cc054415210ec73f4e7e4571f8a0c21ed62/1?per_page=50&name_filter=slow&mergeEphemeralLF=true" -c nosignal

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@bdhirsh your PR has been successfully reverted.

@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Oct 15, 2025
I noticed the default partitioner was breaking in some dynamic shape tests, so prior to turning off functionalization I want to tweak it to pass all of our OpInfo tests




[ghstack-poisoned]
decorator=toleranceOverride({torch.float32: tol(atol=1e-05, rtol=1e-05)}),
# This delta is coming entirely from the clone() on tangents
# in AOTDispatcher to make them contiguous
decorator=toleranceOverride({torch.float32: tol(atol=4e-05, rtol=1e-05)}),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi the slow-test failure that caused the revert was interesting:

(1) this test wobbled tolerance a bit:

 PYTORCH_OPINFO_SAMPLE_INPUT_INDEX=32 PYTORCH_TEST_WITH_SLOW=1 PYTORCH_TEST_SKIP_FAST=1 python test/functorch/test_aotdispatch.py TestEagerFusionOpInfoCPU.test_aot_autograd_symbolic_default_partition_exhaustive_linalg_pinv_singular_cpu_float32

(2) I used @SherlockNoMad 's handy DebugMode to find the different kernels running between eager and aot_eager, and I found the difference is coming entirely from AOTDispatcher emitting clone() on tangents in the backward (presumably this causes us to call linalg.pinv backward with different striding, and the op's numerics are sensitive to strides)

We're going to have to deal with this in the "bitwise equality" workstream. I dont' want to deal with it in this PR, but we might want to consider not running the clone at all, and raising an error if we got the strides of our tangents wrong / require the user to tell us what striding they want for the tangents.

I noticed the default partitioner was breaking in some dynamic shape tests, so prior to turning off functionalization I want to tweak it to pass all of our OpInfo tests




[ghstack-poisoned]
I noticed the default partitioner was breaking in some dynamic shape tests, so prior to turning off functionalization I want to tweak it to pass all of our OpInfo tests




[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Starting merge as part of PR stack under #164577

pytorchmergebot pushed a commit that referenced this pull request Oct 16, 2025
I'm cleaning this PR up as a proper way of disabling functionalization via config in AOTDispatcher. I removed the non-functionalization related changes from the original version:

(1) preventing proxy mode (and functionalization) from incorrectly decomposing CIA ops (Ed has a PR for it here: #164939)

(2) preventing python-dispatcher-based decomps above autograd from running. I'm not doing this for now, will likely do it in a followup

Pull Request resolved: #164577
Approved by: https://github.com/ezyang
ghstack dependencies: #165372
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Oct 21, 2025
I noticed the default partitioner was breaking in some dynamic shape tests, so prior to turning off functionalization I want to tweak it to pass all of our OpInfo tests

Pull Request resolved: pytorch#165372
Approved by: https://github.com/ezyang
ghstack dependencies: pytorch#165327
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Oct 21, 2025
I noticed the default partitioner was breaking in some dynamic shape tests, so prior to turning off functionalization I want to tweak it to pass all of our OpInfo tests

Pull Request resolved: pytorch#165372
Approved by: https://github.com/ezyang
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Oct 21, 2025
…#164577)

I'm cleaning this PR up as a proper way of disabling functionalization via config in AOTDispatcher. I removed the non-functionalization related changes from the original version:

(1) preventing proxy mode (and functionalization) from incorrectly decomposing CIA ops (Ed has a PR for it here: pytorch#164939)

(2) preventing python-dispatcher-based decomps above autograd from running. I'm not doing this for now, will likely do it in a followup

Pull Request resolved: pytorch#164577
Approved by: https://github.com/ezyang
ghstack dependencies: pytorch#165372
zhudada0120 pushed a commit to zhudada0120/pytorch that referenced this pull request Oct 22, 2025
I noticed the default partitioner was breaking in some dynamic shape tests, so prior to turning off functionalization I want to tweak it to pass all of our OpInfo tests

Pull Request resolved: pytorch#165372
Approved by: https://github.com/ezyang
zhudada0120 pushed a commit to zhudada0120/pytorch that referenced this pull request Oct 22, 2025
…#164577)

I'm cleaning this PR up as a proper way of disabling functionalization via config in AOTDispatcher. I removed the non-functionalization related changes from the original version:

(1) preventing proxy mode (and functionalization) from incorrectly decomposing CIA ops (Ed has a PR for it here: pytorch#164939)

(2) preventing python-dispatcher-based decomps above autograd from running. I'm not doing this for now, will likely do it in a followup

Pull Request resolved: pytorch#164577
Approved by: https://github.com/ezyang
ghstack dependencies: pytorch#165372
@github-actions github-actions bot deleted the gh/bdhirsh/674/head branch November 16, 2025 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/inductor ciflow/slow ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: composability release notes category Reverted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants