fix grad thrashing of shape analysis#40939
fix grad thrashing of shape analysis#40939eellison wants to merge 2 commits intogh/eellison/85/basefrom
Conversation
[ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 9e2e6d2 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 3 times. |
| // its most constrained form. | ||
| if (stack[i].isTensor()) | ||
| node->outputs()[i]->inferTypeFrom(stack[i].toTensor()); | ||
| auto tensor_type = node->outputs()[i]->type()->cast<TensorType>(); |
There was a problem hiding this comment.
if we get rid of inferTypeFrom we might be losing some useful information about tensor properties which is probably the reason why we tried running an operation in the first place? Shouldn't we do inferTypeFrom()->withRequiresGrad() ?
There was a problem hiding this comment.
Yes, that’s what we’re doing with tensortype::create(stack[i].toTensor())
There was a problem hiding this comment.
Infer type from calls tensortype::create
There was a problem hiding this comment.
Don’t think we can call withRequiresGrad bc that would get rid of grad analysis that has set it to false. I think we just want to preserve whatever our previous grad inference was
| torch._C._jit_pass_complete_shape_analysis(foo.graph, (torch.tensor([0.39]),), False) | ||
|
|
||
| # requires_grad property shouldn't be accidentally set by shape analysis | ||
| self.assertTrue(foo.graph.findNode("aten::sub").output().requiresGrad() is None) |
There was a problem hiding this comment.
couldn't we actually infer in this particular case that aten::sub doesn't need requires_gard?
There was a problem hiding this comment.
No, we haven’t done grad analysis, we don’t know if it requires grad or not
Previously, when we would do shape analysis by running the op with representative inputs, we would always set the grad property to false. This led to a wrong static analysis when we would create differentiable subgraphs, and propagate shapes without also propagating requires_grad, and then uninline them. Differential Revision: [D22394676](https://our.internmc.facebook.com/intern/diff/D22394676) [ghstack-poisoned]
Summary: Pull Request resolved: pytorch#40939 Previously, when we would do shape analysis by running the op with representative inputs, we would always set the grad property to false. This led to a wrong static analysis when we would create differentiable subgraphs, and propagate shapes without also propagating requires_grad, and then uninline them. Test Plan: Imported from OSS Differential Revision: D22394676 Pulled By: eellison fbshipit-source-id: 254e6e9f964b40d160befe0e125abe1b7aa2bd5e
* [JIT] fix unfold shape analysis (#40749) Summary: unfold on a 0-dimensioned tensor returns a 1-dim tensor Pull Request resolved: #40749 Differential Revision: D22361481 Pulled By: eellison fbshipit-source-id: 621597e5f97f6e39953eb86f8b85bb4142527a9f * shape analysis fix for default dtype' ghstack-source-id: 723aa27 Pull Request resolved: #40938 * fix grad thrashing of shape analysis ghstack-source-id: dd8742b Pull Request resolved: #40939 Co-authored-by: Elias Ellison <eellison@fb.com>
This reverts commit 62da092.
Summary: Pull Request resolved: pytorch#40939 Previously, when we would do shape analysis by running the op with representative inputs, we would always set the grad property to false. This led to a wrong static analysis when we would create differentiable subgraphs, and propagate shapes without also propagating requires_grad, and then uninline them. Test Plan: Imported from OSS Differential Revision: D22394676 Pulled By: eellison fbshipit-source-id: 254e6e9f964b40d160befe0e125abe1b7aa2bd5e
Stack from ghstack:
Previously, when we would do shape analysis by running the op with representative inputs, we would always set the grad property to false. This led to a wrong static analysis when we would create differentiable subgraphs, and propagate shapes without also propagating requires_grad, and then uninline them.
Differential Revision: D22394676