[experimental, WIP] use codegen API by bdhirsh · Pull Request #2869 · pytorch/xla

bdhirsh · 2021-04-09T17:09:54Z

No description provided.

…n-tree" This PR ports the codegen logic from the [XLA repo](https://github.com/pytorch/xla/blob/master/scripts/gen.py) to be in-tree. This particular PR maintains (almost) byte-for-byte compatibility with XLA's codegen, with the differences highlighted below. That means that there's some xla-specific stuff in the codegen, which will be removed in subsequent PR's. **Plan** [] (this PR) merge byte-for-byte version of XLA codegen in-tree [] Get XLA to use the in-tree codegen [] Remove the xla-specific `bridge::` API from the codegen [] Kill the CPU fallback codegen in favor of a boxed kernel [] Remove other xla-specific bits, like the file names (`aten_xla_type.h`) [] update codegen to properly handle generating `inplace`/`out` [] update codegen to handle structured **New codegen workflow** I have a corresponding PR in the `pytorch/xla` repo that uses these changes [here](pytorch/xla#2869). After checking out pytorch/xla from source, you can run `python ../tools/codegen/gen_backend_stubs.py --output_dir="torch_xla/csrc/" --source_yaml="xla_native_functions.yaml"`. It spits 3 files, which are close-to-byte-for-byte-identical to what pytorch/xla currently has: * `torch_xla/csrc/aten_xla_type.h` * `torch_xla/csrc/aten_xla_type_default.h` * `torch_xla/csrc/aten_xla_type_default.cpp` `aten_xla_type.h` is currently maintained manually in the pytorch/xla repo; instead, a new `xla_native_functions.yaml` file will maintain the list of ops that xla supports, and the codegen will generated the header file. This is more similar to what our other in-tree codegen does. Unfortunately I couldn't keep it byte-for-byte identically because the original order of the signatures was arbitrary, but the file contains the same set of declarations with the same signatures. **Byte-for-byte changes** `aten_xla_type_default.h` contains the headers for functions that fall back to CPU, same as before (identical sans comments and whitespace, and `Stream` -> `at::Stream` [they had a `using c10::Stream` somewhere]). `aten_xla_type_default.cpp` contains a few things: * kernels that fall back to CPU for any non-composite ops that xla doesn't support * code-generated out kernels for a few operators, which just call into the xla-implemented kernel * dispatcher registrations for all relevant kernels (xla kernels, codegen'd out wrappers, codegen'd fallbacks) That file is byte-for-byte identical with a few differences: * namespaces: `c10::Device` ->`at::Device`, and `Stream` -> `at::Stream` * The other diffs are smaller and easy to look over, I pasted them here: P408712288 * We no longer manually unpack `TensorOptions` in the CPU fallbacks, and instead use the faithful C++ API * The codegen'd CPU fallback for `miopen_rnn_backward` had a small bug that this codegen fixes: it wasn't converting all output tensor arguments from CPU back to XLA properly. **Data model changes** * New `ExternalBackendMetadata`, `ExternalBackendFunction`, and `ExternalBackendFunctionsGroup` classes. `ExternalBackendMetadata` represents a line from `xla_native_functions.yaml`, and `ExternalBackendFunction/ExternalBackendFunctionsGroup` are data representations that link the metadata to the corresponding native functions. **NOTE: What is staying/leaving after this PR** The bulk of the changes live in `gen_external_fallbacks.py`. It contains: * declarations/definitions for CPU fallbacks * code-generated `out` wrappers (same as what XLA does) * all dispatcher registrations All of the CPU fallback logic will hopefully be killed in the near future, when I write I replace it with a boxed fallback. I opted to put it in a new file (`gen_external_fallbacks.py`) because (a) most of it will disappear soon, making it easier to disentangle later, and (b) it made merge conflicts (slightly) easier to deal with while working on this PR :) When that logic is killed, I'm planning to move the remaining logic (`codegen'd out wrappers + dispatcher registrations) into `register_dispatch_key.py`, since the logic will eventually look a lot more similar. Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346) [ghstack-poisoned]

This PR ports the codegen logic from the [XLA repo](https://github.com/pytorch/xla/blob/master/scripts/gen.py) to be in-tree. This particular PR maintains (almost) byte-for-byte compatibility with XLA's codegen, with the differences highlighted below. That means that there's some xla-specific stuff in the codegen, which will be removed in subsequent PR's. **Plan** [] (this PR) merge byte-for-byte version of XLA codegen in-tree [] Get XLA to use the in-tree codegen [] Remove the xla-specific `bridge::` API from the codegen [] Kill the CPU fallback codegen in favor of a boxed kernel [] Remove other xla-specific bits, like the file names (`aten_xla_type.h`) [] update codegen to properly handle generating `inplace`/`out` [] update codegen to handle structured **New codegen workflow** I have a corresponding PR in the `pytorch/xla` repo that uses these changes [here](pytorch/xla#2869). After checking out pytorch/xla from source, you can run `python ../tools/codegen/gen_backend_stubs.py --output_dir="torch_xla/csrc/" --source_yaml="xla_native_functions.yaml"`. It spits 3 files, which are close-to-byte-for-byte-identical to what pytorch/xla currently has: * `torch_xla/csrc/aten_xla_type.h` * `torch_xla/csrc/aten_xla_type_default.h` * `torch_xla/csrc/aten_xla_type_default.cpp` `aten_xla_type.h` is currently maintained manually in the pytorch/xla repo; instead, a new `xla_native_functions.yaml` file will maintain the list of ops that xla supports, and the codegen will generated the header file. This is more similar to what our other in-tree codegen does. Unfortunately I couldn't keep it byte-for-byte identically because the original order of the signatures was arbitrary, but the file contains the same set of declarations with the same signatures. **Byte-for-byte changes** `aten_xla_type_default.h` contains the headers for functions that fall back to CPU, same as before (identical sans comments and whitespace, and `Stream` -> `at::Stream` [they had a `using c10::Stream` somewhere]). `aten_xla_type_default.cpp` contains a few things: * kernels that fall back to CPU for any non-composite ops that xla doesn't support * code-generated out kernels for a few operators, which just call into the xla-implemented kernel * dispatcher registrations for all relevant kernels (xla kernels, codegen'd out wrappers, codegen'd fallbacks) That file is byte-for-byte identical with a few differences: * namespaces: `c10::Device` ->`at::Device`, and `Stream` -> `at::Stream` * The other diffs are smaller and easy to look over, I pasted them here: P408712288 * We no longer manually unpack `TensorOptions` in the CPU fallbacks, and instead use the faithful C++ API * The codegen'd CPU fallback for `miopen_rnn_backward` had a small bug that this codegen fixes: it wasn't converting all output tensor arguments from CPU back to XLA properly. **Data model changes** * New `ExternalBackendMetadata`, `ExternalBackendFunction`, and `ExternalBackendFunctionsGroup` classes. `ExternalBackendMetadata` represents a line from `xla_native_functions.yaml`, and `ExternalBackendFunction/ExternalBackendFunctionsGroup` are data representations that link the metadata to the corresponding native functions. **NOTE: What is staying/leaving after this PR** The bulk of the changes live in `gen_external_fallbacks.py`. It contains: * declarations/definitions for CPU fallbacks * code-generated `out` wrappers (same as what XLA does) * all dispatcher registrations All of the CPU fallback logic will hopefully be killed in the near future, when I write I replace it with a boxed fallback. I opted to put it in a new file (`gen_external_fallbacks.py`) because (a) most of it will disappear soon, making it easier to disentangle later, and (b) it made merge conflicts (slightly) easier to deal with while working on this PR :) When that logic is killed, I'm planning to move the remaining logic (`codegen'd out wrappers + dispatcher registrations) into `register_dispatch_key.py`, since the logic will eventually look a lot more similar. Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346) [ghstack-poisoned]

JackCaoG · 2021-10-01T04:40:09Z

@bdhirsh Do we still need this pr?

bdhirsh · 2021-10-01T13:46:21Z

this is old and can definitely be closed 😛

I had this open when I was experimenting integrating the public codegen API with XLA

bdhirsh mentioned this pull request Apr 9, 2021

Using a PyTorch-core codegen API #2871

Closed

[experimental, WIP] use public codegen API

75bebab

bdhirsh force-pushed the public_codegen_api branch from 3348b02 to 75bebab Compare April 12, 2021 19:03

bdhirsh added 5 commits April 12, 2021 16:17

to_cpu -> _to_cpu

9a23bf1

to_cpu -> _to_cpu

64d9c59

[Test] update _copy_from() to allow resizing

ffb9b6b

[Test] update _copy_from() to allow resizing

7913205

lint

a46d9a1

bdhirsh mentioned this pull request Apr 16, 2021

generate xla codegen in-tree pytorch/pytorch#55050

Closed

bdhirsh closed this Oct 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[experimental, WIP] use codegen API#2869

[experimental, WIP] use codegen API#2869
bdhirsh wants to merge 6 commits intomasterfrom
public_codegen_api

bdhirsh commented Apr 9, 2021

Uh oh!

JackCaoG commented Oct 1, 2021

Uh oh!

bdhirsh commented Oct 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bdhirsh commented Apr 9, 2021

Uh oh!

JackCaoG commented Oct 1, 2021

Uh oh!

bdhirsh commented Oct 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants