generate inplace/out kernels for xla by bdhirsh · Pull Request #57510 · pytorch/pytorch

bdhirsh · 2021-05-03T22:49:39Z

This is a re-write of #56835, which is significantly shorter thanks to the data model change in the PR below this one in the stack. See the original description in the linked PR for details.

The functional changes in this PR are the same as in the above linked one, so the description is the same with a few small changes:

I don't bother generating at::xla::{op} entries for CPU fallbacks. After looking around, I see precedent for that. For example, we don't have at::cpu::{op} entries for composite ops- if you really want to bypass the dispatcher you need to call at::compositeimplicitautograd::{op}. Maybe we should revisit that later if we find an important use case for having full namespace coverage, but that doesn't seem worth half-fixing for external backends in this PR.

Stack from ghstack:

add a boxed CPU fallback kernel #58065 add a boxed CPU fallback kernel
remove xla-specific stuff from codegen (minus CPU fallback) #58064 remove xla-specific stuff from codegen (minus CPU fallback)
generate inplace/out kernels for xla #57510 generate inplace/out kernels for xla
[codegen] split out backend-specific information from NativeFunction in the model #57361 [codegen] split out backend-specific information from NativeFunction in the model
remove bridge API from codegen #55796 remove bridge API from codegen
[external codegen] better yaml error messaging, added explicit error message tests #56597 [external codegen] better yaml error messaging, added explicit error message tests
add _to_cpu() operator #55795 add _to_cpu() operator

Differential Revision: D28474364

[ghstack-poisoned]

facebook-github-bot · 2021-05-03T22:49:45Z

💊 CI failures summary and remediations

As of commit 9656b73 (more details on the Dr. CI page):

✅ None of the CI failures appear to be your fault 💚

1/1 broken upstream at merge base 1d91c9c on May 03 from 3:01pm to 8:26pm

🚧 1 fixed upstream failure:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

If your commit is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

pytorch_xla_linux_bionic_py3_6_clang9_test on May 03 from 3:01pm to 8:26pm (589072a - bf6e342)
- 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

ghstack-source-id: 2b0d1e3 Pull Request resolved: #57510

[ghstack-poisoned]

ghstack-source-id: 2fcc843 Pull Request resolved: #57510

[ghstack-poisoned]

ghstack-source-id: 3742d8b Pull Request resolved: #57510

bdhirsh · 2021-05-04T17:35:28Z

tools/codegen/dest/register_dispatch_key.py

-                inplace_meta = True
-            else:
+    def gen_unstructured(self, f: NativeFunction, *, is_generated_wrapper: bool = False) -> Optional[str]:
+        with native_function_manager(f):


The with native_function_manager(f) makes the diff painful, but there aren't too many changes in this function. As per the comment in the last PR, I can't easily use a decorator since I have a defaultable argument

It's fine, reviewers can ignore whitespace in GH UI

bdhirsh · 2021-05-04T17:38:54Z

Hooking this up to an XLA-side PR to sanity check.

This is a re-write of #56835, which is significantly shorter thanks to the data model change in the PR below this one in the stack. See the original description in the linked PR for details. The functional changes in this PR are the same as in the above linked one, so the description is the same with a few small changes: - I don't bother generating `at::xla::{op}` entries for CPU fallbacks. After looking around, I see precedent for that. For example, we don't have `at::cpu::{op}` entries for composite ops- if you really want to bypass the dispatcher you need to call `at::compositeimplicitautograd::{op}`. Maybe we should revisit that later if we find an important use case for having full namespace coverage, but that doesn't seem worth half-fixing for external backends in this PR. [ghstack-poisoned]

ghstack-source-id: 172973c Pull Request resolved: #57510

ezyang · 2021-05-05T17:28:29Z

tools/codegen/model.py

-    def functions(self) -> Iterator[NativeFunction]:
-        yield self.out
-        yield self.functional
+    def functions(self, *, functional_first: bool = False) -> Iterator[NativeFunction]:


Kind of goofy

yeah you're right, I think we should be able to unconditionally use the functional kernel first.

ezyang · 2021-05-05T17:30:40Z

tools/codegen/dest/register_dispatch_key.py

 TORCH_CHECK(options.device() == out.device(),
    "Expected out tensor to have device ", options.device(), ", but got ", out.device(), " instead");
-bool resized = at::native::resize_output(outputs_[output_idx], sizes);
+bool resized = {self.cpp_namespace}::resize_output(outputs_[output_idx], sizes);


This seems a little questionable; are you actually changing what namespace we reference resize_output here?

Yep that's a bug, thanks! Since at::native::resize_output is the thing we want to call to resize the output tensor, regardless of which backend we're generating. We wouldn't have hit it until we eventually tried to enable structured external kernels.

I think that's true for the as_strided_ call below too

Also for the empty calls above, I think

ezyang · 2021-05-10T14:45:16Z

tools/codegen/dest/gen_external_aten_fallbacks.py

+                # We also skip registrations if there is a functional backend kernel,
+                # because we generate out/inplace wrappers in that case (handled in register_dispatch_key.py).
+                if self.backend_index.get(f) is not None or \
+                        (isinstance(g, NativeFunctionsGroup) and self.backend_index.get(g.functional) is not None):


The downside to having this split in a separate dest here is that you have to keep the logic here in sync with register_dispatch_key.py for when to generate out/inplace wrappers. This is probably... OK, because there are fairly large benefits to having this be a separate file, but it's probably good to explicitly cross ref the corresponding condition and/or have a single function where you do the test that commons up the logic (even though it is short to type).

ezyang · 2021-05-10T14:48:45Z

tools/codegen/dest/register_dispatch_key.py

        else:
            assert_never(f)

+    def gets_generated_out_inplace_wrapper(self, f: NativeFunction, g: NativeFunctionsGroup) -> bool:


This is the code to xref, I think

yep thanks, this should be shared in gen_external_aten_fallbacks.py. Another thing to not worry about later when/if the boxed CPU fallback works out. (That's also why I kind of wanted to keep all the fallback stuff in a separate file, so it'll be easier to disentangle if we want to)

ezyang · 2021-05-10T14:57:22Z

tools/codegen/dest/register_dispatch_key.py

+        # All other functionality can be directly pulled from gen_unstructured.
+        is_generated_wrapper = self.gets_generated_out_inplace_wrapper(f, g)
+        if self.target is not Target.ANONYMOUS_DEFINITION:
+            return self.gen_unstructured(f, is_generated_wrapper=is_generated_wrapper)


This short circuit smacks of "we put the interception point for gen_inplace_out_wrapper too early"--to tell if this function is actually implemented correctly, I still have to understand all of gen_unstructured (and there is now an extra illegal state which is when is_generated_wrapper=True but self.target is Target.ANONYMOUS_DEFINITION). Instead of the condition at https://github.com/pytorch/pytorch/pull/57510/files?diff=unified&w=1#diff-e9998363493c5dcd97969d077dcdea1e835c95bf39f2bbb5afde127a3c1ef728R82 ; maybe you should just directly embed this logic in gen_unstructured. You can guarantee that you end up in there even in the structured case by adjusting the condition at https://github.com/pytorch/pytorch/pull/57510/files?diff=unified&w=1#diff-e9998363493c5dcd97969d077dcdea1e835c95bf39f2bbb5afde127a3c1ef728R170 appropriately. If the complaint is that gen_unstructured is too long, I'd suggest remedying that by refactoring the internal codegen pieces of gen_unstructured into helper methods.

Agreed that all the conditional logic that decides when to call gen_unstructured vs. gen_inplace_out_wrapper, is hard to follow, and I like the idea of keeping it all in the same place - that section of gen_structured() that you pointed out.

My main motivation for keeping this in a separate function was that this is the only bit of unstructured codegen that actually cares about the entire NativeFunctionGroup object, so it would be confusing to pass an extra (optional) NativeFunctionGroup object to gen_unstructured that's only used under certain conditions- but you're right, factoring out bits of gen_unstructured into helper functions should be fine.

Might be a good idea to just pass it unconditionally, tbh, in case more codegen in unstructured kernels start making use of the group information.

ezyang · 2021-05-10T15:02:06Z

tools/codegen/dest/register_dispatch_key.py

+            updates = f'at::_copy_from_and_resize({functional_result_name}, {ret_name});'
+            returns = ret_name
+
+        functional_sig = self.wrapper_kernel_sig(g.functional)


Although you can skip the translate call here because everything is the same, I wonder if it would be clearer for readers to translate anyway... (maybe there's some stuff translate chokes on?)

The argument for consistency makes sense to me, I'll add it in (just tested it and translate doesn't choke)

ezyang · 2021-05-10T15:02:58Z

tools/codegen/dest/register_dispatch_key.py

+        return_names = cpp.return_names(f)
+        if len(return_names) > 1:
+            updates = '\n  '.join(
+                f'at::_copy_from_and_resize(std::get<{i}>({functional_result_name}), {ret_name});'


I guess you shouldn't actually be willing to resize if it's an inplace, this is only a very modest oversight though

Oh, good catch. I think that any size mismatch errors would be caught in the meta function, so we should never end up in that situation. But getting the codegen to explicitly call a non-resizing function for inplace wrappers definitely expresses intent more clearly.

Actually yeah that was a straight up bug- since a lot of XLA kernels don't do shape checks (LazyTensor meta() support can't come soon enough). So before when I was using _copy_from_and_resize() for inplace ops, we would just re-size self on XLA, while CPU would correctly error in the meta function.

using _copy_from gives you a pretty crappy error message, but there's probably no point trying to make that better until we start using meta functions.

ezyang

please check reviews

This is a re-write of #56835, which is significantly shorter thanks to the data model change in the PR below this one in the stack. See the original description in the linked PR for details. The functional changes in this PR are the same as in the above linked one, so the description is the same with a few small changes: - I don't bother generating `at::xla::{op}` entries for CPU fallbacks. After looking around, I see precedent for that. For example, we don't have `at::cpu::{op}` entries for composite ops- if you really want to bypass the dispatcher you need to call `at::compositeimplicitautograd::{op}`. Maybe we should revisit that later if we find an important use case for having full namespace coverage, but that doesn't seem worth half-fixing for external backends in this PR. [ghstack-poisoned]

bhosmer · 2021-05-12T03:11:30Z

tools/codegen/api/types.py

    # The schema this signature is derived from
    func: FunctionSchema

+    prefix: str = ""


worth a comment? it opens up a pretty big axis of variation for what the class represents...

Definitely fair, I'll add it in

bhosmer · 2021-05-12T03:11:52Z

tools/codegen/api/types.py

    @staticmethod
-    def from_schema(func: FunctionSchema) -> 'DispatcherSignature':
-        return DispatcherSignature(func)
+    def from_schema(func: FunctionSchema, *, prefix: str = '') -> 'DispatcherSignature':


ditto re comment

bhosmer · 2021-05-12T03:14:30Z

tools/codegen/api/types.py

 # Helper functions

-def kernel_signature(f: NativeFunction, backend_index: BackendIndex) -> Union['NativeSignature', 'DispatcherSignature']:
+def kernel_signature(


ditto re comment for prefix - also I'm not far enough into the PR yet to understand why it's a freestanding param and not part of the data model.

bhosmer · 2021-05-12T03:19:15Z

tools/codegen/dest/register_dispatch_key.py


+    def wrapper_kernel_sig(self, f: NativeFunction) -> Union[NativeSignature, DispatcherSignature]:
+        # The prefix is just to ensure uniqueness. The Dispatcher API doesn't guarantee unique kernel names.
+        return kernel_signature(f, self.backend_index, prefix=f'wrapper_{f.func.name.overload_name}_')


Just a high level observation and not to overengineer, but here prefix is derived from f in a pretty straightforward way. If that's true everywhere, then it might be cleaner to host the prefix-making logic in NativeFunction.

So, let me know if we disagree, but the value of prefix here seems more to me like an implementation detail of how we code-generate the stuff that we register to the dispatcher than a property of a NativeFunction. Rather than directly registering native kernels to the dispatcher, the codegen creates anonymous wrappers that call into to the native kernels, and registers those wrappers instead. We just append wrapper_ to each wrapper function's name for clarity, and to avoid name collisions with the namespaced definitions that we generate in the same file.

So I kind of see NativeSignature/DispatcherSignature as a lens into how to display a given NativeFunction's signature, like what schema and naming transformations to apply, and the wrapper prefix is just another knob along those same lines.

[Edit: to be clear, this is a minor point in the context of the PR! It just touches on an interesting data modeling design point. Totally fine with me to this land as-is if you'd rather move things along.]

Ah sorry! Not NativeFunction but rather NativeSignature and DispatchSignature. You're right that prefixes shouldn't be part of NativeFunction's world. But the only "free" value here is the 'wrapper_{}_' - the function name in the middle comes from f, which is 1-1 with whichever Signature wrapper_kernel_sig() constructs - so making the caller supply the correspondence here would only be useful if we ever wanted to break that relationship, which I don't think we would, right? It seems more like an opportunity for the caller to mess things up by passing a constant string or whatever.

Anyway to cut to the chase, the whole correct by construction thing would say, only pass the constant part of the prefix here and onward into the Signatures, and make them (the Signatures) responsible for folding f.func.name.overload_name into nonempty prefixes.

(BTW if you don't actually use multiple prefixes, you could just have the Signatures carry a bit saying whether to mangle in the overload name. (And if kernel_signature() is always called with a nonempty prefix, you could just have it unilaterally construct the Signature instances with prefixes turned on.)

This is a re-write of #56835, which is significantly shorter thanks to the data model change in the PR below this one in the stack. See the original description in the linked PR for details. The functional changes in this PR are the same as in the above linked one, so the description is the same with a few small changes: - I don't bother generating `at::xla::{op}` entries for CPU fallbacks. After looking around, I see precedent for that. For example, we don't have `at::cpu::{op}` entries for composite ops- if you really want to bypass the dispatcher you need to call `at::compositeimplicitautograd::{op}`. Maybe we should revisit that later if we find an important use case for having full namespace coverage, but that doesn't seem worth half-fixing for external backends in this PR. [ghstack-poisoned]

bdhirsh · 2021-05-17T12:46:40Z

@bdhirsh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-05-17T19:26:16Z

@bdhirsh merged this pull request in 1a9efbb.

Summary: Pull Request resolved: pytorch#57510 This is a re-write of pytorch#56835, which is significantly shorter thanks to the data model change in the PR below this one in the stack. See the original description in the linked PR for details. The functional changes in this PR are the same as in the above linked one, so the description is the same with a few small changes: - I don't bother generating `at::xla::{op}` entries for CPU fallbacks. After looking around, I see precedent for that. For example, we don't have `at::cpu::{op}` entries for composite ops- if you really want to bypass the dispatcher you need to call `at::compositeimplicitautograd::{op}`. Maybe we should revisit that later if we find an important use case for having full namespace coverage, but that doesn't seem worth half-fixing for external backends in this PR. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28474364 Pulled By: bdhirsh fbshipit-source-id: 4d58b60e5debad6f1ff06420597d8df8505b2876

Summary: Pull Request resolved: #57510 This is a re-write of #56835, which is significantly shorter thanks to the data model change in the PR below this one in the stack. See the original description in the linked PR for details. The functional changes in this PR are the same as in the above linked one, so the description is the same with a few small changes: - I don't bother generating `at::xla::{op}` entries for CPU fallbacks. After looking around, I see precedent for that. For example, we don't have `at::cpu::{op}` entries for composite ops- if you really want to bypass the dispatcher you need to call `at::compositeimplicitautograd::{op}`. Maybe we should revisit that later if we find an important use case for having full namespace coverage, but that doesn't seem worth half-fixing for external backends in this PR. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28474364 Pulled By: bdhirsh fbshipit-source-id: 4d58b60e5debad6f1ff06420597d8df8505b2876

generate inplace/out kernels for xla

11e167a

[ghstack-poisoned]

This was referenced May 3, 2021

add _to_cpu() operator #55795

Closed

[external codegen] better yaml error messaging, added explicit error message tests #56597

Closed

remove bridge API from codegen #55796

Closed

facebook-github-bot added the cla signed label May 3, 2021

bdhirsh mentioned this pull request May 3, 2021

[codegen] split out backend-specific information from NativeFunction in the model #57361

Closed

bdhirsh added a commit that referenced this pull request May 3, 2021

generate inplace/out kernels for xla

c42826c

ghstack-source-id: 2b0d1e3 Pull Request resolved: #57510

Update on "generate inplace/out kernels for xla"

cfdc466

[ghstack-poisoned]

bdhirsh added a commit that referenced this pull request May 3, 2021

generate inplace/out kernels for xla

f9d573b

ghstack-source-id: 2fcc843 Pull Request resolved: #57510

Update on "generate inplace/out kernels for xla"

d8d33d5

[ghstack-poisoned]

bdhirsh added a commit that referenced this pull request May 4, 2021

generate inplace/out kernels for xla

01859eb

ghstack-source-id: 3742d8b Pull Request resolved: #57510

bdhirsh requested a review from ezyang May 4, 2021 17:33

bdhirsh commented May 4, 2021

View reviewed changes

bdhirsh requested a review from bhosmer May 4, 2021 20:09

bdhirsh added a commit that referenced this pull request May 4, 2021

generate inplace/out kernels for xla

5e5bbc5

ghstack-source-id: 172973c Pull Request resolved: #57510

ezyang reviewed May 5, 2021

View reviewed changes

ezyang reviewed May 10, 2021

View reviewed changes

ezyang approved these changes May 10, 2021

View reviewed changes

This was referenced May 11, 2021

remove xla-specific stuff from codegen (minus CPU fallback) #58064

Closed

add a boxed CPU fallback kernel #58065

Closed

Use some new codegen functionality pytorch/xla#2915

Merged

bhosmer approved these changes May 12, 2021

View reviewed changes

bdhirsh added 2 commits May 12, 2021 07:05

facebook-github-bot closed this in 1a9efbb May 17, 2021

facebook-github-bot added the Merged label May 17, 2021

This was referenced May 20, 2021

Cherrypick xla codegen #58717

Closed

[v1.9.0] Release Tracker #58518

Closed

facebook-github-bot deleted the gh/bdhirsh/113/head branch May 21, 2021 14:17

bdhirsh mentioned this pull request May 21, 2021

[codegen] generated inplace/out= wrappers don't have input checks #58765

Open

Conversation

bdhirsh commented May 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented May 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🚧 1 fixed upstream failure:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bdhirsh commented May 4, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bdhirsh May 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bhosmer May 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bdhirsh commented May 17, 2021

Uh oh!

facebook-github-bot commented May 17, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bdhirsh commented May 3, 2021 •

edited

Loading

facebook-github-bot commented May 3, 2021 •

edited

Loading

bdhirsh May 5, 2021 •

edited

Loading

bhosmer May 12, 2021 •

edited

Loading