generate in-place/out wrappers for external kernels by bdhirsh · Pull Request #56835 · pytorch/pytorch

bdhirsh · 2021-04-23T22:52:34Z

Overview
This PR adds support for auto-gening in-place/out wrappers given a functional kernel for external backends.

If the backend provides a functional kernel but not an out kernel, we generate one for them
If the backend provides a functional kernel but not an inplace kernel, we generate one for them
If they explicitly provide either kernel, we use whatever they provided

With this change, the codegen generates two new files for xla that re more in line with what our core codegen generates:

RegisterXLA.cpp
RegisterAutogradXLA.cpp

functionality delta
For context, what XLA’s codegen currently provides is:

Support for generating out wrapper kernels, which is specified in a hardcoded list. This PR will subsume that change, and generate out kernels whenever we encounter an op that has a functional kernel but no out kernel
No support for in-place wrappers. XLA has hand-written kernels for the in-place variants of most ops. Need to confirm this, but I think we’ll be able to remove a lot of that code in favor of these codegen’d wrappers later. As a test, I have a test PR here that removes add_ and uses the codegen’d version instead.

code changes

The PR also splits up the external backend codegen logic over two places.

register_dispatch_key.py. Contains all the "new" stuff: kernel wrappers + dispatcher registrations for all xla kernels, including codegen'd inplace/out wrappers. It also provides support for namespaced definitions, e.g. at::xla::{op}, although this feature isn't "turned on" yet.
gen_external_aten_fallbacks.py. I updated this file to really only do fallback-related stuff, which should make it easier to kill if we switch to a boxed fallback later. It not longer generates out wrappers, and it no longer generates dispatcher registrations for all kernels; only for CPU fallbacks.

logic re-use in the codegen
Just want to call out my reasoning for what goes where in the codegen w.r.t. two changes:

For normal unstructured kernels, I almost entirely re-used RegisterDispatchKey.gen_unstructured, with some changes to make it work for external backend kernels.
For generating the out/inplace wrappers, I pushed all of that logic into it's own function; it's small enough + different enough from the existing structured/unstructured logic that I had a hard time trying to generalize the existing logic to handle this case.
One note on the split between CPU fallback logic vs. everything else. There's one area where the logic bleeds over: Right now in gen_unstructured(), I actually generate namespaced definitions for all backend kernels, include those in a CPU fallback. My thought process was that the goal of a CPU fallback is to kind of brush over deficiencies in what the backend has implemented, and in a similar vein it would be nice if at::xla::{op} "just worked" for all ops, if someone wanted to write C++ code in a uniform API that was meant just for xla.

Before (pictorially)

RegisterDispatchKey.__call__() -------> gen_unstructured()

After (pictorially). My ascii art is awful but hopefully you get the idea

RegisterDispatchKey.__call__() -------------------------------------> gen_unstructured()
                             \                                       /
                               \                                   /
                                     --> gen_inplace_out_wrapper()

gen_inplace_out_wrapper() has the specific task of generating anonymous definitions (wrappers) for inplace/out kernels, and otherwise forwards all other logic, including dispatcher registrations + namespaced definitions, to gen_unstructured().

One wrinkle: gen_unstructured takes in a single Union[NativeFunction, ExternalBackendFunction], which technically isn't enough context to decide when to generate registrations/namespaced definitions. That's because it now also needs the context of the entire function group: if the backend has not implemented an out kernel, then the information on whether or not to generate a registration for it is entirely dependent on whether the backend has implemented a functional kernel. I ended up adding an extra optional argument to gen_unstructured() to deal with that; a bool that tells gen_unstructured if it's dealing with a autogenerated wrapper.

What this PR does not do
Structured kernel support for external backends. To add at some point in the future.

Stack from ghstack:

[do not merge] temporary fix #56962 [do not merge] temporary fix
generate in-place/out wrappers for external kernels #56835 [WIP] generate in-place/out wrappers for external kernels
add to ExternalBackendFunction data model #56834 add to ExternalBackendFunction data model
remove bridge API from codegen #55796 remove bridge API from codegen
[external codegen] better yaml error messaging, added explicit error message tests #56597 [external codegen] better yaml error messaging, added explicit error message tests
add _to_cpu() operator #55795 add _to_cpu() operator

[ghstack-poisoned]

facebook-github-bot · 2021-04-23T22:52:42Z

💊 CI failures summary and remediations

As of commit 847ab16 (more details on the Dr. CI page):

5/5 failures possibly* introduced in this PR
- 1/5 non-scanned failure(s)

4 failures not recognized by patterns:

Job	Step	Action
^flake8-py3	^Unknown	🔁 rerun
^clang-tidy	^Unknown	🔁 rerun
^mypy	^Unknown	🔁 rerun
^quick-checks	^Unknown	🔁 rerun

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-bionic-rocm4.1-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

[ghstack-poisoned]

ghstack-source-id: 792647c Pull Request resolved: #56835

bdhirsh · 2021-04-26T16:44:41Z

tools/codegen/context.py

    ExternalBackendFunction,
    ExternalBackendFunctionsGroup,
    Union[NativeFunction, NativeFunctionsGroup],
+    Union[NativeFunction, ExternalBackendFunction],


the blow-up continues

**Overview** This PR adds support for auto-gening in-place/out wrappers given a functional kernel for external backends. - If the backend provides a functional kernel but not an out kernel, we generate one for them - If the backend provides a functional kernel but not an inplace kernel, we generate one for them - If they explicitly provide either kernel, we use whatever they provided With this change, the codegen generates two new files for xla that re more in line with what our core codegen generates: - `RegisterXLA.cpp` - `RegisterAutogradXLA.cpp` **functionality delta** For context, what XLA’s codegen currently provides is: - Support for generating `out` wrapper kernels, which is specified in a hardcoded list. This PR will subsume that change, and generate out kernels whenever we encounter an op that has a functional kernel but no out kernel - No support for in-place wrappers. XLA has hand-written kernels for the in-place variants of most ops. Need to confirm this, but I think we’ll be able to remove a lot of that code in favor of these codegen’d wrappers later. As a test, I have a test PR [here](pytorch/xla#2915) that removes `add_` and uses the codegen’d version instead. **code changes** The PR also splits up the external backend codegen logic over two places. - `register_dispatch_key.py`. Contains all the "new" stuff: kernel wrappers + dispatcher registrations for all xla kernels, including codegen'd inplace/out wrappers. It also provides support for namespaced definitions, e.g. `at::xla::{op}`, although this feature isn't "turned on" yet. - `gen_external_aten_fallbacks.py`. I updated this file to *really* only do fallback-related stuff, which should make it easier to kill if we switch to a boxed fallback later. It not longer generates `out` wrappers, and it no longer generates dispatcher registrations for all kernels; only for CPU fallbacks. **logic re-use in the codegen** Just want to call out my reasoning for what goes where in the codegen w.r.t. two changes: 1. For normal unstructured kernels, I almost entirely re-used `RegisterDispatchKey.gen_unstructured` 2. For generating the out/inplace wrappers, I pushed all of that logic into it's own function; it's small enough + different enough from the existing structured/unstructured logic that I had a hard time trying to generalize the existing logic to handle this case. 3. One note on the split between CPU fallback logic vs. everything else. There's one area where the logic bleeds over: Right now in `gen_unstructured()`, I actually generate namespaced definitions for all backend kernels, include those in a CPU fallback. My thought process was that the goal of a CPU fallback is to kind of brush over deficiencies in what the backend has implemented, and in a similar vein it would be nice if `at::xla::{op}` "just worked" for all ops, if someone wanted to write C++ code in a uniform API that was meant just for xla. **What this PR does not do** Structured kernel support for external backends. To add at some point in the future. [ghstack-poisoned]

ghstack-source-id: b180328 Pull Request resolved: #56835

**Overview** This PR adds support for auto-gening in-place/out wrappers given a functional kernel for external backends. - If the backend provides a functional kernel but not an out kernel, we generate one for them - If the backend provides a functional kernel but not an inplace kernel, we generate one for them - If they explicitly provide either kernel, we use whatever they provided With this change, the codegen generates two new files for xla that re more in line with what our core codegen generates: - `RegisterXLA.cpp` - `RegisterAutogradXLA.cpp` **functionality delta** For context, what XLA’s codegen currently provides is: - Support for generating `out` wrapper kernels, which is specified in a hardcoded list. This PR will subsume that change, and generate out kernels whenever we encounter an op that has a functional kernel but no out kernel - No support for in-place wrappers. XLA has hand-written kernels for the in-place variants of most ops. Need to confirm this, but I think we’ll be able to remove a lot of that code in favor of these codegen’d wrappers later. As a test, I have a test PR [here](pytorch/xla#2915) that removes `add_` and uses the codegen’d version instead. **code changes** The PR also splits up the external backend codegen logic over two places. - `register_dispatch_key.py`. Contains all the "new" stuff: kernel wrappers + dispatcher registrations for all xla kernels, including codegen'd inplace/out wrappers. It also provides support for namespaced definitions, e.g. `at::xla::{op}`, although this feature isn't "turned on" yet. - `gen_external_aten_fallbacks.py`. I updated this file to *really* only do fallback-related stuff, which should make it easier to kill if we switch to a boxed fallback later. It not longer generates `out` wrappers, and it no longer generates dispatcher registrations for all kernels; only for CPU fallbacks. **logic re-use in the codegen** Just want to call out my reasoning for what goes where in the codegen w.r.t. two changes: 1. For normal unstructured kernels, I almost entirely re-used `RegisterDispatchKey.gen_unstructured`, with some changes to make it work for external backend kernels. 2. For generating the out/inplace wrappers, I pushed all of that logic into it's own function; it's small enough + different enough from the existing structured/unstructured logic that I had a hard time trying to generalize the existing logic to handle this case. 3. One note on the split between CPU fallback logic vs. everything else. There's one area where the logic bleeds over: Right now in `gen_unstructured()`, I actually generate namespaced definitions for all backend kernels, include those in a CPU fallback. My thought process was that the goal of a CPU fallback is to kind of brush over deficiencies in what the backend has implemented, and in a similar vein it would be nice if `at::xla::{op}` "just worked" for all ops, if someone wanted to write C++ code in a uniform API that was meant just for xla. Before (pictorially) ``` RegisterDispatchKey.__call__() -------> gen_unstructured() ``` After (pictorially). My ascii art is awful but hopefully you get the idea ``` RegisterDispatchKey.__call__() -------------------------------------> gen_unstructured() \ / \ / --> gen_inplace_out_wrapper() ``` `gen_inplace_out_wrapper()` has the specific task of generating anonymous definitions (wrappers) for inplace/out kernels, and otherwise forwards all other logic, including dispatcher registrations + namespaced definitions, to `gen_unstructured()`. One wrinkle: `gen_unstructured` takes in a single `Union[NativeFunction, ExternalBackendFunction]`, which technically isn't enough context to decide when to generate registrations/namespaced definitions. That's because it now also needs the context of the entire function group: if the backend has not implemented an `out` kernel, then the information on whether or not to generate a registration for it is entirely dependent on whether the backend has implemented a `functional` kernel. I ended up adding an extra optional argument to `gen_unstructured()` to deal with that; a bool that tells gen_unstructured if it's dealing with a autogenerated wrapper. **What this PR does not do** Structured kernel support for external backends. To add at some point in the future. [ghstack-poisoned]

bdhirsh · 2021-04-26T22:39:20Z

tools/codegen/dest/register_dispatch_key.py

        )
        return list(mapMaybe(structured_gen.gen_one, g.functions()))

-    @method_with_native_function


So, I'm going to have to add this back before I merge the PR. One annoying aspect of my change is that I added an optional is_generated_wrapper parameter to gen_unstructured. In order to re-use gen_unstructured for the code-generated inplace/out wrappers, gen_unstructured needs that extra bit of information: it's not available just from the ExternalBackendFunction, because it requires information relevant to the entire function group. i.e. we only generate a wrapper if (a) the inplace/out kernel is not implemented, and (b) the functional kernel is implemented.

Another option would have been to try to factor out the logic for registrations/definitions/declarations into their own functions that are then called by both gen_unstructured and the gen_out_inplace_wrappers code, but I tentatively found the approach I took here to be less confusing.

After some digging I couldn't find a way to make a flavor of method_with_native_function that properly deals with defaultable arguments, so my plan was to add a with native_function_manager(f): ... at the beginning of this function. That would indent the entire function and make it impossible to see the delta in this PR though, so my plan is to wait until it's been reviewed to add that in. Or, take it out completely depending on what the feedback is :)

I didn't understand all of this (still reading), but indentation is not a problem, you can tell people to use GH ignore whitespace feature (or review on phab)

Probably just need to rework @method_with_native_function entirely. Tis really annoying to do decorators and strongly typed Python at the same time.

bdhirsh · 2021-04-26T23:19:58Z

tools/codegen/dest/register_dispatch_key.py

+                # This is needed in order for namespaced definitions to call into CPU fallbacks,
+                # which live in a different namespace.
+                # TODO: this logic will change if we implement a boxed CPU fallback.
+                if isinstance(native_or_external, ExternalBackendFunction) \


if we want at::xla::{op} to work for all ops (including cpu fallbacks), this is where the extra logic lives.

I take it back, maybe we shouldn't generate at::xla 😂

ezyang · 2021-04-27T03:08:58Z

tools/codegen/api/dispatcher.py

+        # This isn't an important characteristic of the dispatcher API, and could be removed if we want to further unify
+        # The dispatcher and C++ API's. There happen to be a few places in the codegen where we need to guarantee name uniqueness.
+        name = f'{name}_{func.name.overload_name}'
+    return name


ezyang · 2021-04-27T03:13:40Z

tools/codegen/dest/register_dispatch_key.py

-        elif isinstance(f, NativeFunction):
+        elif isinstance(f, ExternalBackendFunctionsGroup):
+            if f.structured:
+                raise AssertionError("structured kernels not implemented yet for external backends")


Do you even ever fill structured field on ExternalBackendFunctionsGroup?

ezyang · 2021-04-27T03:21:47Z

tools/codegen/model.py

+            yield self.functional
+            yield self.out
+            if self.inplace is not None:
+                yield self.inplace


so... when do you ever need the out variant to come first?

yeah you don't. I guess I just left the out-first version in to be consistent with in-tree codegen, but that's not really necessary. I'll just kill the flag and add a comment explaining why we need functional to come first.

This is a re-write of #56835, which is significantly shorter thanks to the data model change in the PR below this one in the stack. See the original description in the linked PR for details. The functional changes in this PR are the same as in the above linked one, so the description is the same with a few small changes: - I don't bother generating `at::xla::{op}` entries for CPU fallbacks. After looking around, I see precedent for that. For example, we don't have `at::cpu::{op}` entries for composite ops- if you really want to bypass the dispatcher you need to call `at::compositeimplicitautograd::{op}`. Maybe we should revisit that later if we find an important use case for having full namespace coverage, but that doesn't seem worth half-fixing for external backends in this PR. [ghstack-poisoned]

Summary: Pull Request resolved: #57510 This is a re-write of #56835, which is significantly shorter thanks to the data model change in the PR below this one in the stack. See the original description in the linked PR for details. The functional changes in this PR are the same as in the above linked one, so the description is the same with a few small changes: - I don't bother generating `at::xla::{op}` entries for CPU fallbacks. After looking around, I see precedent for that. For example, we don't have `at::cpu::{op}` entries for composite ops- if you really want to bypass the dispatcher you need to call `at::compositeimplicitautograd::{op}`. Maybe we should revisit that later if we find an important use case for having full namespace coverage, but that doesn't seem worth half-fixing for external backends in this PR. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28474364 Pulled By: bdhirsh fbshipit-source-id: 4d58b60e5debad6f1ff06420597d8df8505b2876

Summary: Pull Request resolved: pytorch#57510 This is a re-write of pytorch#56835, which is significantly shorter thanks to the data model change in the PR below this one in the stack. See the original description in the linked PR for details. The functional changes in this PR are the same as in the above linked one, so the description is the same with a few small changes: - I don't bother generating `at::xla::{op}` entries for CPU fallbacks. After looking around, I see precedent for that. For example, we don't have `at::cpu::{op}` entries for composite ops- if you really want to bypass the dispatcher you need to call `at::compositeimplicitautograd::{op}`. Maybe we should revisit that later if we find an important use case for having full namespace coverage, but that doesn't seem worth half-fixing for external backends in this PR. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28474364 Pulled By: bdhirsh fbshipit-source-id: 4d58b60e5debad6f1ff06420597d8df8505b2876

Summary: Pull Request resolved: #57510 This is a re-write of #56835, which is significantly shorter thanks to the data model change in the PR below this one in the stack. See the original description in the linked PR for details. The functional changes in this PR are the same as in the above linked one, so the description is the same with a few small changes: - I don't bother generating `at::xla::{op}` entries for CPU fallbacks. After looking around, I see precedent for that. For example, we don't have `at::cpu::{op}` entries for composite ops- if you really want to bypass the dispatcher you need to call `at::compositeimplicitautograd::{op}`. Maybe we should revisit that later if we find an important use case for having full namespace coverage, but that doesn't seem worth half-fixing for external backends in this PR. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28474364 Pulled By: bdhirsh fbshipit-source-id: 4d58b60e5debad6f1ff06420597d8df8505b2876

github-actions · 2022-04-13T04:42:19Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

[WIP] generate in-place/out wrappers for external kernels

1b2d820

[ghstack-poisoned]

This was referenced Apr 23, 2021

add _to_cpu() operator #55795

Closed

[external codegen] better yaml error messaging, added explicit error message tests #56597

Closed

remove bridge API from codegen #55796

Closed

add to ExternalBackendFunction data model #56834

Closed

facebook-github-bot added the cla signed label Apr 23, 2021

Update on "[WIP] generate in-place/out wrappers for external kernels"

aaecfc8

[ghstack-poisoned]

bdhirsh added a commit that referenced this pull request Apr 23, 2021

[WIP] generate in-place/out wrappers for external kernels

22bc8ee

ghstack-source-id: 792647c Pull Request resolved: #56835

bdhirsh commented Apr 26, 2021

View reviewed changes

bdhirsh added a commit that referenced this pull request Apr 26, 2021

[WIP] generate in-place/out wrappers for external kernels

989c023

ghstack-source-id: b180328 Pull Request resolved: #56835

bdhirsh mentioned this pull request Apr 26, 2021

[do not merge] temporary fix #56962

Closed

bdhirsh commented Apr 26, 2021

View reviewed changes

bdhirsh requested review from bhosmer and ezyang April 26, 2021 23:20

bdhirsh changed the title ~~[WIP] generate in-place/out wrappers for external kernels~~ generate in-place/out wrappers for external kernels Apr 26, 2021

ezyang reviewed Apr 27, 2021

View reviewed changes

This was referenced Apr 28, 2021

generate xla codegen in-tree #55050

Closed

generate inplace/out kernels for xla #57510

Closed

github-actions bot added the Stale label Apr 13, 2022

github-actions bot closed this May 13, 2022

facebook-github-bot deleted the gh/bdhirsh/109/head branch June 12, 2022 14:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generate in-place/out wrappers for external kernels#56835

generate in-place/out wrappers for external kernels#56835
bdhirsh wants to merge 4 commits intogh/bdhirsh/109/basefrom
gh/bdhirsh/109/head

bdhirsh commented Apr 23, 2021 •

edited

Loading

Uh oh!

facebook-github-bot commented Apr 23, 2021 •

edited

Loading

Uh oh!

bdhirsh Apr 26, 2021

Uh oh!

bdhirsh Apr 26, 2021

Uh oh!

ezyang Apr 27, 2021

Uh oh!

ezyang Apr 27, 2021

Uh oh!

bdhirsh Apr 26, 2021

Uh oh!

ezyang Apr 27, 2021

Uh oh!

ezyang Apr 27, 2021

Uh oh!

ezyang Apr 27, 2021

Uh oh!

ezyang Apr 27, 2021

Uh oh!

bdhirsh Apr 28, 2021

Uh oh!

github-actions bot commented Apr 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bdhirsh commented Apr 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Apr 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

4 failures not recognized by patterns:

ci.pytorch.org: 1 failed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bdhirsh commented Apr 23, 2021 •

edited

Loading

facebook-github-bot commented Apr 23, 2021 •

edited

Loading