split out functionalization codegen to use view_copy operators#75302
split out functionalization codegen to use view_copy operators#75302bdhirsh wants to merge 15 commits intogh/bdhirsh/197/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 2bdc99d (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
…tors" [ghstack-poisoned]
|
@bdhirsh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…tors" Differential Revision: [D35419652](https://our.internmc.facebook.com/intern/diff/D35419652) [ghstack-poisoned]
…tors" Differential Revision: [D35419652](https://our.internmc.facebook.com/intern/diff/D35419652) [ghstack-poisoned]
|
@bdhirsh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…tors" Differential Revision: [D35419652](https://our.internmc.facebook.com/intern/diff/D35419652) [ghstack-poisoned]
|
@bdhirsh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…tors" Differential Revision: [D35419652](https://our.internmc.facebook.com/intern/diff/D35419652) [ghstack-poisoned]
|
@bdhirsh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…tors" Differential Revision: [D35419652](https://our.internmc.facebook.com/intern/diff/D35419652) [ghstack-poisoned]
…tors" Differential Revision: [D35419652](https://our.internmc.facebook.com/intern/diff/D35419652) [ghstack-poisoned]
| { | ||
| HANDLE_TH_ERRORS | ||
| c10::impl::tls_set_dispatch_key_included(at::DispatchKey::Functionalize, false); | ||
| c10::impl::tls_set_dispatch_key_included(at::DispatchKey::FunctionalizeAddBackViews, false); |
There was a problem hiding this comment.
Isn't this going to prevent clean nesting?
In particular, if the python API should be a context manager, I would expect that it will re-set the state that was there before.
But maybe that's not considered here? If so, should we just forbid enabling it if it is already enabled?
There was a problem hiding this comment.
Yeah you're right, I wasn't thinking too much about the nesting case. Maybe I'll just forbid it now until there's a need.
|
|
||
| # Convenience decorator for functions that explicitly take in a BackendIndex, | ||
| # instead of indirectly taking one in as a closure | ||
| def with_native_function_and_selector_and_index( |
There was a problem hiding this comment.
The comment above should go as well no?
There was a problem hiding this comment.
The comment also applies to the wrapper below the one that I deleted (technically it takes a dict of BackendIndices, I can fix that)
| 'ops_headers', | ||
| 'func_definitions', | ||
| 'func_registrations', | ||
| 'func_add_back_views_definitions', |
There was a problem hiding this comment.
Do we really need to shard since this is only for view functions?
There was a problem hiding this comment.
yeah this is definitely overkill. Let me try to unwind it.
It's for all view + inplace + out= ops actually, which is quite large. Right now the 4 generated files are each 10k lines long.
There was a problem hiding this comment.
Ho the out ops are there? Then yes makes sense.
…tors"
This PR splits the functionalization codegen into 2 pieces:
(1) Vanilla functionalization will now always turn view ops into "view_copy" ops.
(2) For functorch to "reapply views underneath the pass", I added a new dispatch key, "FunctionalizeAddBackViews". I codegen a kernel to that key for every view_copy operator that just calls back into the view op. All other ops get a fallthrough kernel.
Also - the codegen will now unconditionally register CompositeImplicitAutograd kernels directly to the functionalization keys, so we "always decompose" before hitting functionalization. Otherwise, we might break and accidentally send "view" calls to the backend, if we decompose an op into a view underneath the functionalization pass.
The important changes are in `gen.py` and `gen_functionalization_type.py` - most of the other changes are just plumbing `{view}_copy` everywhere. I also updated `test_functionalization.py`, and added expecttests for the "add back views" case.
One thing about the `AddBackViews` key - right now, I add it into the TLS include set. The other option would be to try to add it directly to the tensors, but that's kind of hard: putting it on the `FunctionalTensorWrapper` doesn't help, because the functionalization pass will unwrap when it calls back into the dispatcher, and run on the "inner tensor" (maybe we could modify the inner tensor's keyset and add the `AddBackViews` key when functionalization happens, instead?)
I also have an accompanying functorch change here: pytorch/functorch#678
Differential Revision: [D35419652](https://our.internmc.facebook.com/intern/diff/D35419652)
[ghstack-poisoned]
…tors"
This PR splits the functionalization codegen into 2 pieces:
(1) Vanilla functionalization will now always turn view ops into "view_copy" ops.
(2) For functorch to "reapply views underneath the pass", I added a new dispatch key, "FunctionalizeAddBackViews". I codegen a kernel to that key for every view_copy operator that just calls back into the view op. All other ops get a fallthrough kernel.
Also - the codegen will now unconditionally register CompositeImplicitAutograd kernels directly to the functionalization keys, so we "always decompose" before hitting functionalization. Otherwise, we might break and accidentally send "view" calls to the backend, if we decompose an op into a view underneath the functionalization pass.
The important changes are in `gen.py` and `gen_functionalization_type.py` - most of the other changes are just plumbing `{view}_copy` everywhere. I also updated `test_functionalization.py`, and added expecttests for the "add back views" case.
One thing about the `AddBackViews` key - right now, I add it into the TLS include set. The other option would be to try to add it directly to the tensors, but that's kind of hard: putting it on the `FunctionalTensorWrapper` doesn't help, because the functionalization pass will unwrap when it calls back into the dispatcher, and run on the "inner tensor" (maybe we could modify the inner tensor's keyset and add the `AddBackViews` key when functionalization happens, instead?)
I also have an accompanying functorch change here: pytorch/functorch#678
Differential Revision: [D35419652](https://our.internmc.facebook.com/intern/diff/D35419652)
[ghstack-poisoned]
…tors"
This PR splits the functionalization codegen into 2 pieces:
(1) Vanilla functionalization will now always turn view ops into "view_copy" ops.
(2) For functorch to "reapply views underneath the pass", I added a new dispatch key, "FunctionalizeAddBackViews". I codegen a kernel to that key for every view_copy operator that just calls back into the view op. All other ops get a fallthrough kernel.
Also - the codegen will now unconditionally register CompositeImplicitAutograd kernels directly to the functionalization keys, so we "always decompose" before hitting functionalization. Otherwise, we might break and accidentally send "view" calls to the backend, if we decompose an op into a view underneath the functionalization pass.
The important changes are in `gen.py` and `gen_functionalization_type.py` - most of the other changes are just plumbing `{view}_copy` everywhere. I also updated `test_functionalization.py`, and added expecttests for the "add back views" case.
One thing about the `AddBackViews` key - right now, I add it into the TLS include set. The other option would be to try to add it directly to the tensors, but that's kind of hard: putting it on the `FunctionalTensorWrapper` doesn't help, because the functionalization pass will unwrap when it calls back into the dispatcher, and run on the "inner tensor" (maybe we could modify the inner tensor's keyset and add the `AddBackViews` key when functionalization happens, instead?)
I also have an accompanying functorch change here: pytorch/functorch#678
Differential Revision: [D35419652](https://our.internmc.facebook.com/intern/diff/D35419652)
[ghstack-poisoned]
c10/core/DispatchKey.h
Outdated
| // removal, | ||
| // we can consider adding separate keys dedicated to those individual passes. | ||
| // See Note [Functionalization Pass In Core] for details. | ||
| FunctionalizeAddBackViews, |
There was a problem hiding this comment.
I'm wondering if we actually need a separate key for this. An alternative is to just have some TLS that Functionalize queries when implementing _copy operations.
There was a problem hiding this comment.
Yeah I think I should just do this. We'll save the extra key, and it also feels easier to reason about than the extra redispatch (where in functorch, now you need to worry about having two redispatch's "sandwiched" inside of DynamicLayer).
Current plan: we already have a FuncTorchTLS construct in core, and AddBackViews is currrently only a useful concept in the context of the functionalize() transform, so I'm thinking of adding a flag in there for it.
The functionalize view kernels would just check that TLS right before redispatching (to either a view or a view_copy op), and functorch's DynamicLayer logic would be responsible for setting / un-setting that TLS upon entering / exiting the layer.
(We could probably do something really similar for "preserving mutations"- I remember discussions about that being useful to mobile).
tools/codegen/api/types.py
Outdated
| # Some assertions: lambdas are only used for view ops | ||
| assert f.is_view_op | ||
| assert functional_op.is_view_op | ||
| assert not functional_op.is_view_op |
There was a problem hiding this comment.
(i felt the same way lol)
There was a problem hiding this comment.
Context: before this PR, we have view_inplace kernels that would redispatch to the view kernel (transpose_ to transpose). So the "functional_op" here was transpose, and both were view ops.
Now, f will be either a view or view_inplace op, and functional_op will always be the view_copy op.
| # Generates view_copy() composite kernels for all view_copy operators. | ||
|
|
||
|
|
||
| FunctionalizationTarget = Enum('Target', ( |
There was a problem hiding this comment.
'FunctionalizationTarget', not 'Target'
There was a problem hiding this comment.
Also, Target is intended to be open, so you could add these there. Though, I'd probably try to follow the existing naming convention (which tries to say if you are DEFINITION or DECLARATION in the name) more closely
| g, | ||
| backend_indices[DispatchKey.CompositeImplicitAutograd], | ||
| FunctionalizationTarget.ADD_BACK_VIEWS, | ||
| ), |
There was a problem hiding this comment.
It's not clear to me why you don't just generate both the addbackviews and regular things all in one go in a gen_functionalization_registration call. Save you a lot of typing here.
There was a problem hiding this comment.
(the thing Target exists for is when you're mapping over a list of functions, but no mapping is going on here...)
|
updated codegen output would be much appreciated |
| view_inverse_sig = ViewInverseSignature(f) | ||
| return view_inverse_sig.decl() | ||
| view_api_name = g.view.func.name.unambiguous_name() | ||
| exprs = ', '.join([e.expr for e in translate(view_copy_sig.arguments(), view_sig.arguments())]) |
There was a problem hiding this comment.
Sometime we should add some APIs that make doing this kind of "call-to-call" less wordy haha
| @@ -29,6 +32,11 @@ | |||
| # Generates view_copy() composite kernels for all view_copy operators. | |||
There was a problem hiding this comment.
This comment needs a nice, long update
There was a problem hiding this comment.
yeah... it's actually up-to-date again now that I removed all of the AddBackView kernels
| funcTarget: FunctionalizationTarget, | ||
| ) -> List[str]: | ||
| if not needs_functionalization(selector, g): | ||
| if not needs_functionalization(selector, g, Target.ANONYMOUS_DEFINITION, funcTarget): |
There was a problem hiding this comment.
how come this one is anonymous?
| f = g | ||
| assert modifies_arguments(f) | ||
| return [emit_inplace_functionalization_body(f, functional_op)] | ||
| if funcTarget == FunctionalizationTarget.FUNCTIONALIZATION: |
There was a problem hiding this comment.
it's more idiomatic to do these conditions as
if funcTarget is FunctionalizationTarget.FUNCTIONALIZATION:
...
elif funcTarget is FunctionalizationTarget.ADD_BACK_VIEWS:
...
else:
assert_never(funcTarget)
you then get compile time exhaustiveness checking
There was a problem hiding this comment.
ah yep - this will go away though if I'm using TLS to keep the views around instead of an extra dispatch key / set of kernels to generate.
|
|
||
| # TODO: later, I'll generate separate kernels for "Functionalize", and "AddBackViews" | ||
| # That will probably be an enum, that this function will take in to know which ops get kernels. | ||
| def needs_functionalization( |
There was a problem hiding this comment.
as you are now unconditionally calling gen_ops_header, it doesn't seem to me like there is any good reason to have factored out the needs functionalization boolean test from the actual codegen. Just do the eligibility test in the codegen; it's not like you're saving any work splitting it up this way, and then you have less asserts to do in the actual generating code.
| if isinstance(g, NativeFunctionsViewGroup): | ||
| # Case 1: emit view -> view_copy kernels for the functionalization pass | ||
| view_defs = [] | ||
| if not g.view.has_composite_implicit_autograd_kernel: |
There was a problem hiding this comment.
I understand this is from preexisting code, but the test for g.view.has_composite_implicit_autograd_kernel seems very obscure. All you're really doing is trying to distinguish between composite views and non-composite views. Is there any reason to put composite views in NativeFunctionsViewGroup? And if so, it seems like it would be better to have this be a property on NativeFunctionsViewGroup itself that can then be documented on what its semantic meaning. Directly referencing has_composite_implicit_autograd_kernel feels super low level for the logic here.
There was a problem hiding this comment.
The code here allows for the possibility of a view op being CompositeImplicitAutograd but its view_inplace variant to not be (or vice versa).
I just checked though and as per your comment further down: we don't actually have an ops like that today. So maybe I should just enforce "composite-ness" between view and view_inplace ops and simplify this code (make compositeness a property of the group).
There was a problem hiding this comment.
Yeah, if you can enforce a more restrictive check, defo go for it.
| # if the view is not composite (implicit autograd) | ||
| assert g.view_copy is not None | ||
| view_defs.append(emit_view_functionalization_body(g.view, g.view_copy)) | ||
| if g.view_inplace is not None and not g.view_inplace.has_composite_implicit_autograd_kernel: |
There was a problem hiding this comment.
is it ever the case that the view inplace has composite implicit autograd, but the view doesn't (or vice versa?)
There was a problem hiding this comment.
(commented above - as of today, nope)
| view_str.append(emit_registration_helper(g.view_inplace, is_view=True)) | ||
| return view_str | ||
| # we always want to register composite kernels | ||
| if g.view.has_composite_implicit_autograd_kernel: |
There was a problem hiding this comment.
The control flow here matches exactly the codegen logic below, is that right? This is exactly what Target was about; rather than having to keep the control flow thicket in sync, instead you just reuse the control flow and on the inside you then case on the target, so you only have to say the control flow once.
ezyang
left a comment
There was a problem hiding this comment.
This looks functionally correct but I do think it can be cleaner.
…tors"
This PR splits the functionalization codegen into 2 pieces:
(1) Vanilla functionalization will now always turn view ops into "view_copy" ops.
(2) For functorch to "reapply views underneath the pass", I added a new dispatch key, "FunctionalizeAddBackViews". I codegen a kernel to that key for every view_copy operator that just calls back into the view op. All other ops get a fallthrough kernel.
Also - the codegen will now unconditionally register CompositeImplicitAutograd kernels directly to the functionalization keys, so we "always decompose" before hitting functionalization. Otherwise, we might break and accidentally send "view" calls to the backend, if we decompose an op into a view underneath the functionalization pass.
The important changes are in `gen.py` and `gen_functionalization_type.py` - most of the other changes are just plumbing `{view}_copy` everywhere. I also updated `test_functionalization.py`, and added expecttests for the "add back views" case.
One thing about the `AddBackViews` key - right now, I add it into the TLS include set. The other option would be to try to add it directly to the tensors, but that's kind of hard: putting it on the `FunctionalTensorWrapper` doesn't help, because the functionalization pass will unwrap when it calls back into the dispatcher, and run on the "inner tensor" (maybe we could modify the inner tensor's keyset and add the `AddBackViews` key when functionalization happens, instead?)
I also have an accompanying functorch change here: pytorch/functorch#678
Differential Revision: [D35419652](https://our.internmc.facebook.com/intern/diff/D35419652)
[ghstack-poisoned]
…tors"
This PR splits the functionalization codegen into 2 pieces:
(1) Vanilla functionalization will now always turn view ops into "view_copy" ops.
(2) For functorch to "reapply views underneath the pass", I added a new dispatch key, "FunctionalizeAddBackViews". I codegen a kernel to that key for every view_copy operator that just calls back into the view op. All other ops get a fallthrough kernel.
Also - the codegen will now unconditionally register CompositeImplicitAutograd kernels directly to the functionalization keys, so we "always decompose" before hitting functionalization. Otherwise, we might break and accidentally send "view" calls to the backend, if we decompose an op into a view underneath the functionalization pass.
The important changes are in `gen.py` and `gen_functionalization_type.py` - most of the other changes are just plumbing `{view}_copy` everywhere. I also updated `test_functionalization.py`, and added expecttests for the "add back views" case.
One thing about the `AddBackViews` key - right now, I add it into the TLS include set. The other option would be to try to add it directly to the tensors, but that's kind of hard: putting it on the `FunctionalTensorWrapper` doesn't help, because the functionalization pass will unwrap when it calls back into the dispatcher, and run on the "inner tensor" (maybe we could modify the inner tensor's keyset and add the `AddBackViews` key when functionalization happens, instead?)
I also have an accompanying functorch change here: pytorch/functorch#678
Differential Revision: [D35419652](https://our.internmc.facebook.com/intern/diff/D35419652)
[ghstack-poisoned]
|
Pushed another round of updates. List of changes: (1) Removed the The main annoying thing here is for the view_inverse lambdas. I didn't want to have to manually write out twice as many "view_inverse" functions, so instead I updated them all to plumb through a (2) removed (3) I ended up plumbing |
…tors"
This PR splits the functionalization codegen into 2 pieces:
(1) Vanilla functionalization will now always turn view ops into "view_copy" ops.
(2) For functorch to "reapply views underneath the pass", I added a new dispatch key, "FunctionalizeAddBackViews". I codegen a kernel to that key for every view_copy operator that just calls back into the view op. All other ops get a fallthrough kernel.
Also - the codegen will now unconditionally register CompositeImplicitAutograd kernels directly to the functionalization keys, so we "always decompose" before hitting functionalization. Otherwise, we might break and accidentally send "view" calls to the backend, if we decompose an op into a view underneath the functionalization pass.
The important changes are in `gen.py` and `gen_functionalization_type.py` - most of the other changes are just plumbing `{view}_copy` everywhere. I also updated `test_functionalization.py`, and added expecttests for the "add back views" case.
One thing about the `AddBackViews` key - right now, I add it into the TLS include set. The other option would be to try to add it directly to the tensors, but that's kind of hard: putting it on the `FunctionalTensorWrapper` doesn't help, because the functionalization pass will unwrap when it calls back into the dispatcher, and run on the "inner tensor" (maybe we could modify the inner tensor's keyset and add the `AddBackViews` key when functionalization happens, instead?)
I also have an accompanying functorch change here: pytorch/functorch#678
Differential Revision: [D35419652](https://our.internmc.facebook.com/intern/diff/D35419652)
[ghstack-poisoned]
|
@bdhirsh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
This seems troublesome, because doesn't this make it impossible to give accurate schema information for the view inverse function? You don't know if it will alias or not without knowing the boolean setting. |
|
@pytorchbot merge this (Initiating merge automatically since Phabricator Diff has merged) |
|
Can't merge closed PR #75302 |
Summary: Pull Request resolved: #75302 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D35419652 Pulled By: bdhirsh fbshipit-source-id: 26c1249a4e2c8d606f9275b5c59970c3abd0886f
Pull Request resolved: #75302 Approved by: https://github.com/ezyang (cherry picked from commit cb17973)
This PR splits the functionalization codegen into 2 pieces:
(1) Vanilla functionalization will now always turn view ops into "view_copy" ops.
(2) For functorch to "reapply views underneath the pass", I added a new dispatch key, "FunctionalizeAddBackViews". I codegen a kernel to that key for every view_copy operator that just calls back into the view op. All other ops get a fallthrough kernel.
Also - the codegen will now unconditionally register CompositeImplicitAutograd kernels directly to the functionalization keys, so we "always decompose" before hitting functionalization. Otherwise, we might break and accidentally send "view" calls to the backend, if we decompose an op into a view underneath the functionalization pass.
The important changes are in
gen.pyandgen_functionalization_type.py- most of the other changes are just plumbing{view}_copyeverywhere. I also updatedtest_functionalization.py, and added expecttests for the "add back views" case.One thing about the
AddBackViewskey - right now, I add it into the TLS include set. The other option would be to try to add it directly to the tensors, but that's kind of hard: putting it on theFunctionalTensorWrapperdoesn't help, because the functionalization pass will unwrap when it calls back into the dispatcher, and run on the "inner tensor" (maybe we could modify the inner tensor's keyset and add theAddBackViewskey when functionalization happens, instead?)I also have an accompanying functorch change here: pytorch/functorch#678
Stack from ghstack:
Differential Revision: D35419652