Skip to content

[experimental, WIP] use codegen API#2869

Closed
bdhirsh wants to merge 6 commits intomasterfrom
public_codegen_api
Closed

[experimental, WIP] use codegen API#2869
bdhirsh wants to merge 6 commits intomasterfrom
public_codegen_api

Conversation

@bdhirsh
Copy link
Copy Markdown
Contributor

@bdhirsh bdhirsh commented Apr 9, 2021

No description provided.

@bdhirsh bdhirsh force-pushed the public_codegen_api branch from 3348b02 to 75bebab Compare April 12, 2021 19:03
bdhirsh added a commit to pytorch/pytorch that referenced this pull request Apr 16, 2021
…n-tree"

This PR ports the codegen logic from the [XLA repo](https://github.com/pytorch/xla/blob/master/scripts/gen.py) to be in-tree.

This particular PR maintains (almost) byte-for-byte compatibility with XLA's codegen, with the differences highlighted below. That means that there's some xla-specific stuff in the codegen, which will be removed in subsequent PR's.

**Plan**
[] (this PR) merge byte-for-byte version of XLA codegen in-tree
[] Get XLA to use the in-tree codegen
[] Remove the xla-specific `bridge::` API from the codegen
[] Kill the CPU fallback codegen in favor of a boxed kernel
[] Remove other xla-specific bits, like the file names (`aten_xla_type.h`)
[] update codegen to properly handle generating `inplace`/`out`
[] update codegen to handle structured

**New codegen workflow**
I have a corresponding PR in the `pytorch/xla` repo that uses these changes [here](pytorch/xla#2869).
After checking out pytorch/xla from source, you can run `python ../tools/codegen/gen_backend_stubs.py  --output_dir="torch_xla/csrc/" --source_yaml="xla_native_functions.yaml"`. It spits 3 files, which are close-to-byte-for-byte-identical to what pytorch/xla currently has:
* `torch_xla/csrc/aten_xla_type.h`
* `torch_xla/csrc/aten_xla_type_default.h`
* `torch_xla/csrc/aten_xla_type_default.cpp`

`aten_xla_type.h` is currently maintained manually in the pytorch/xla repo; instead, a new `xla_native_functions.yaml` file will maintain the list of ops that xla supports, and the codegen will generated the header file. This is more similar to what our other in-tree codegen does. Unfortunately I couldn't keep it byte-for-byte identically because the original order of the signatures was arbitrary, but the file contains the same set of declarations with the same signatures.


**Byte-for-byte changes**

`aten_xla_type_default.h` contains the headers for functions that fall back to CPU, same as before (identical sans comments and whitespace, and `Stream` -> `at::Stream` [they had a `using c10::Stream` somewhere]).

`aten_xla_type_default.cpp` contains a few things:
* kernels that fall back to CPU for any non-composite ops that xla doesn't support
* code-generated out kernels for a few operators, which just call into the xla-implemented kernel
* dispatcher registrations for all relevant kernels (xla kernels, codegen'd out wrappers, codegen'd fallbacks)

That file is byte-for-byte identical with a few differences:
* namespaces: `c10::Device` ->`at::Device`, and `Stream` -> `at::Stream`
* The other diffs are smaller and easy to look over, I pasted them here: P408712288
  * We no longer manually unpack `TensorOptions` in the CPU fallbacks, and instead use the faithful C++ API
  * The codegen'd CPU fallback for `miopen_rnn_backward` had a small bug that this codegen fixes: it wasn't converting all output tensor arguments from CPU back to XLA properly.


**Data model changes**
* New `ExternalBackendMetadata`, `ExternalBackendFunction`, and `ExternalBackendFunctionsGroup` classes. `ExternalBackendMetadata` represents a line from `xla_native_functions.yaml`, and `ExternalBackendFunction/ExternalBackendFunctionsGroup` are data representations that link the metadata to the corresponding native functions.


**NOTE: What is staying/leaving after this PR**
The bulk of the changes live in `gen_external_fallbacks.py`. It contains:
* declarations/definitions for CPU fallbacks
* code-generated `out` wrappers (same as what XLA does)
* all dispatcher registrations

All of the CPU fallback logic will hopefully be killed in the near future, when I write I replace it with a boxed fallback. I opted to put it in a new file (`gen_external_fallbacks.py`) because (a) most of it will disappear soon, making it easier to disentangle later, and (b) it made merge conflicts (slightly) easier to deal with while working on this PR :)

When that logic is killed, I'm planning to move the remaining logic (`codegen'd out wrappers + dispatcher registrations) into `register_dispatch_key.py`, since the logic will eventually look a lot more similar.




Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346)

[ghstack-poisoned]
bdhirsh added a commit to pytorch/pytorch that referenced this pull request Apr 16, 2021
This PR ports the codegen logic from the [XLA repo](https://github.com/pytorch/xla/blob/master/scripts/gen.py) to be in-tree.

This particular PR maintains (almost) byte-for-byte compatibility with XLA's codegen, with the differences highlighted below. That means that there's some xla-specific stuff in the codegen, which will be removed in subsequent PR's.

**Plan**
[] (this PR) merge byte-for-byte version of XLA codegen in-tree
[] Get XLA to use the in-tree codegen
[] Remove the xla-specific `bridge::` API from the codegen
[] Kill the CPU fallback codegen in favor of a boxed kernel
[] Remove other xla-specific bits, like the file names (`aten_xla_type.h`)
[] update codegen to properly handle generating `inplace`/`out`
[] update codegen to handle structured

**New codegen workflow**
I have a corresponding PR in the `pytorch/xla` repo that uses these changes [here](pytorch/xla#2869).
After checking out pytorch/xla from source, you can run `python ../tools/codegen/gen_backend_stubs.py  --output_dir="torch_xla/csrc/" --source_yaml="xla_native_functions.yaml"`. It spits 3 files, which are close-to-byte-for-byte-identical to what pytorch/xla currently has:
* `torch_xla/csrc/aten_xla_type.h`
* `torch_xla/csrc/aten_xla_type_default.h`
* `torch_xla/csrc/aten_xla_type_default.cpp`

`aten_xla_type.h` is currently maintained manually in the pytorch/xla repo; instead, a new `xla_native_functions.yaml` file will maintain the list of ops that xla supports, and the codegen will generated the header file. This is more similar to what our other in-tree codegen does. Unfortunately I couldn't keep it byte-for-byte identically because the original order of the signatures was arbitrary, but the file contains the same set of declarations with the same signatures.


**Byte-for-byte changes**

`aten_xla_type_default.h` contains the headers for functions that fall back to CPU, same as before (identical sans comments and whitespace, and `Stream` -> `at::Stream` [they had a `using c10::Stream` somewhere]).

`aten_xla_type_default.cpp` contains a few things:
* kernels that fall back to CPU for any non-composite ops that xla doesn't support
* code-generated out kernels for a few operators, which just call into the xla-implemented kernel
* dispatcher registrations for all relevant kernels (xla kernels, codegen'd out wrappers, codegen'd fallbacks)

That file is byte-for-byte identical with a few differences:
* namespaces: `c10::Device` ->`at::Device`, and `Stream` -> `at::Stream`
* The other diffs are smaller and easy to look over, I pasted them here: P408712288
  * We no longer manually unpack `TensorOptions` in the CPU fallbacks, and instead use the faithful C++ API
  * The codegen'd CPU fallback for `miopen_rnn_backward` had a small bug that this codegen fixes: it wasn't converting all output tensor arguments from CPU back to XLA properly.


**Data model changes**
* New `ExternalBackendMetadata`, `ExternalBackendFunction`, and `ExternalBackendFunctionsGroup` classes. `ExternalBackendMetadata` represents a line from `xla_native_functions.yaml`, and `ExternalBackendFunction/ExternalBackendFunctionsGroup` are data representations that link the metadata to the corresponding native functions.


**NOTE: What is staying/leaving after this PR**
The bulk of the changes live in `gen_external_fallbacks.py`. It contains:
* declarations/definitions for CPU fallbacks
* code-generated `out` wrappers (same as what XLA does)
* all dispatcher registrations

All of the CPU fallback logic will hopefully be killed in the near future, when I write I replace it with a boxed fallback. I opted to put it in a new file (`gen_external_fallbacks.py`) because (a) most of it will disappear soon, making it easier to disentangle later, and (b) it made merge conflicts (slightly) easier to deal with while working on this PR :)

When that logic is killed, I'm planning to move the remaining logic (`codegen'd out wrappers + dispatcher registrations) into `register_dispatch_key.py`, since the logic will eventually look a lot more similar.




Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346)

[ghstack-poisoned]
bdhirsh added a commit to pytorch/pytorch that referenced this pull request Apr 20, 2021
This PR ports the codegen logic from the [XLA repo](https://github.com/pytorch/xla/blob/master/scripts/gen.py) to be in-tree.

This particular PR maintains (almost) byte-for-byte compatibility with XLA's codegen, with the differences highlighted below. That means that there's some xla-specific stuff in the codegen, which will be removed in subsequent PR's.

**Plan**
[] (this PR) merge byte-for-byte version of XLA codegen in-tree
[] Get XLA to use the in-tree codegen
[] Remove the xla-specific `bridge::` API from the codegen
[] Kill the CPU fallback codegen in favor of a boxed kernel
[] Remove other xla-specific bits, like the file names (`aten_xla_type.h`)
[] update codegen to properly handle generating `inplace`/`out`
[] update codegen to handle structured

**New codegen workflow**
I have a corresponding PR in the `pytorch/xla` repo that uses these changes [here](pytorch/xla#2869).
After checking out pytorch/xla from source, you can run `python ../tools/codegen/gen_backend_stubs.py  --output_dir="torch_xla/csrc/" --source_yaml="xla_native_functions.yaml"`. It spits 3 files, which are close-to-byte-for-byte-identical to what pytorch/xla currently has:
* `torch_xla/csrc/aten_xla_type.h`
* `torch_xla/csrc/aten_xla_type_default.h`
* `torch_xla/csrc/aten_xla_type_default.cpp`

`aten_xla_type.h` is currently maintained manually in the pytorch/xla repo; instead, a new `xla_native_functions.yaml` file will maintain the list of ops that xla supports, and the codegen will generated the header file. This is more similar to what our other in-tree codegen does. Unfortunately I couldn't keep it byte-for-byte identically because the original order of the signatures was arbitrary, but the file contains the same set of declarations with the same signatures.


**Byte-for-byte changes**

`aten_xla_type_default.h` contains the headers for functions that fall back to CPU, same as before (identical sans comments and whitespace, and `Stream` -> `at::Stream` [they had a `using c10::Stream` somewhere]).

`aten_xla_type_default.cpp` contains a few things:
* kernels that fall back to CPU for any non-composite ops that xla doesn't support
* code-generated out kernels for a few operators, which just call into the xla-implemented kernel
* dispatcher registrations for all relevant kernels (xla kernels, codegen'd out wrappers, codegen'd fallbacks)

That file is byte-for-byte identical with a few differences:
* namespaces: `c10::Device` ->`at::Device`, and `Stream` -> `at::Stream`
* The other diffs are smaller and easy to look over, I pasted them here: P408712288
  * We no longer manually unpack `TensorOptions` in the CPU fallbacks, and instead use the faithful C++ API
  * The codegen'd CPU fallback for `miopen_rnn_backward` had a small bug that this codegen fixes: it wasn't converting all output tensor arguments from CPU back to XLA properly.


**Data model changes**
* New `ExternalBackendMetadata`, `ExternalBackendFunction`, and `ExternalBackendFunctionsGroup` classes. `ExternalBackendMetadata` represents a line from `xla_native_functions.yaml`, and `ExternalBackendFunction/ExternalBackendFunctionsGroup` are data representations that link the metadata to the corresponding native functions.


**NOTE: What is staying/leaving after this PR**
The bulk of the changes live in `gen_external_fallbacks.py`. It contains:
* declarations/definitions for CPU fallbacks
* code-generated `out` wrappers (same as what XLA does)
* all dispatcher registrations

All of the CPU fallback logic will hopefully be killed in the near future, when I write I replace it with a boxed fallback. I opted to put it in a new file (`gen_external_fallbacks.py`) because (a) most of it will disappear soon, making it easier to disentangle later, and (b) it made merge conflicts (slightly) easier to deal with while working on this PR :)

When that logic is killed, I'm planning to move the remaining logic (`codegen'd out wrappers + dispatcher registrations) into `register_dispatch_key.py`, since the logic will eventually look a lot more similar.




Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346)

[ghstack-poisoned]
bdhirsh added a commit to pytorch/pytorch that referenced this pull request Apr 20, 2021
This PR ports the codegen logic from the [XLA repo](https://github.com/pytorch/xla/blob/master/scripts/gen.py) to be in-tree.

This particular PR maintains (almost) byte-for-byte compatibility with XLA's codegen, with the differences highlighted below. That means that there's some xla-specific stuff in the codegen, which will be removed in subsequent PR's.

**Plan**
[] (this PR) merge byte-for-byte version of XLA codegen in-tree
[] Get XLA to use the in-tree codegen
[] Remove the xla-specific `bridge::` API from the codegen
[] Kill the CPU fallback codegen in favor of a boxed kernel
[] Remove other xla-specific bits, like the file names (`aten_xla_type.h`)
[] update codegen to properly handle generating `inplace`/`out`
[] update codegen to handle structured

**New codegen workflow**
I have a corresponding PR in the `pytorch/xla` repo that uses these changes [here](pytorch/xla#2869).
After checking out pytorch/xla from source, you can run `python ../tools/codegen/gen_backend_stubs.py  --output_dir="torch_xla/csrc/" --source_yaml="xla_native_functions.yaml"`. It spits 3 files, which are close-to-byte-for-byte-identical to what pytorch/xla currently has:
* `torch_xla/csrc/aten_xla_type.h`
* `torch_xla/csrc/aten_xla_type_default.h`
* `torch_xla/csrc/aten_xla_type_default.cpp`

`aten_xla_type.h` is currently maintained manually in the pytorch/xla repo; instead, a new `xla_native_functions.yaml` file will maintain the list of ops that xla supports, and the codegen will generated the header file. This is more similar to what our other in-tree codegen does. Unfortunately I couldn't keep it byte-for-byte identically because the original order of the signatures was arbitrary, but the file contains the same set of declarations with the same signatures.


**Byte-for-byte changes**

`aten_xla_type_default.h` contains the headers for functions that fall back to CPU, same as before (identical sans comments and whitespace, and `Stream` -> `at::Stream` [they had a `using c10::Stream` somewhere]).

`aten_xla_type_default.cpp` contains a few things:
* kernels that fall back to CPU for any non-composite ops that xla doesn't support
* code-generated out kernels for a few operators, which just call into the xla-implemented kernel
* dispatcher registrations for all relevant kernels (xla kernels, codegen'd out wrappers, codegen'd fallbacks)

That file is byte-for-byte identical with a few differences:
* namespaces: `c10::Device` ->`at::Device`, and `Stream` -> `at::Stream`
* The other diffs are smaller and easy to look over, I pasted them here: P408712288
  * We no longer manually unpack `TensorOptions` in the CPU fallbacks, and instead use the faithful C++ API
  * The codegen'd CPU fallback for `miopen_rnn_backward` had a small bug that this codegen fixes: it wasn't converting all output tensor arguments from CPU back to XLA properly.


**Data model changes**
* New `ExternalBackendMetadata`, `ExternalBackendFunction`, and `ExternalBackendFunctionsGroup` classes. `ExternalBackendMetadata` represents a line from `xla_native_functions.yaml`, and `ExternalBackendFunction/ExternalBackendFunctionsGroup` are data representations that link the metadata to the corresponding native functions.


**NOTE: What is staying/leaving after this PR**
The bulk of the changes live in `gen_external_fallbacks.py`. It contains:
* declarations/definitions for CPU fallbacks
* code-generated `out` wrappers (same as what XLA does)
* all dispatcher registrations

All of the CPU fallback logic will hopefully be killed in the near future, when I write I replace it with a boxed fallback. I opted to put it in a new file (`gen_external_fallbacks.py`) because (a) most of it will disappear soon, making it easier to disentangle later, and (b) it made merge conflicts (slightly) easier to deal with while working on this PR :)

When that logic is killed, I'm planning to move the remaining logic (`codegen'd out wrappers + dispatcher registrations) into `register_dispatch_key.py`, since the logic will eventually look a lot more similar.




Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346)

[ghstack-poisoned]
bdhirsh added a commit to pytorch/pytorch that referenced this pull request Apr 20, 2021
This PR ports the codegen logic from the [XLA repo](https://github.com/pytorch/xla/blob/master/scripts/gen.py) to be in-tree.

This particular PR maintains (almost) byte-for-byte compatibility with XLA's codegen, with the differences highlighted below. That means that there's some xla-specific stuff in the codegen, which will be removed in subsequent PR's.

**Plan**
[] (this PR) merge byte-for-byte version of XLA codegen in-tree
[] Get XLA to use the in-tree codegen
[] Remove the xla-specific `bridge::` API from the codegen
[] Kill the CPU fallback codegen in favor of a boxed kernel
[] Remove other xla-specific bits, like the file names (`aten_xla_type.h`)
[] update codegen to properly handle generating `inplace`/`out`
[] update codegen to handle structured

**New codegen workflow**
I have a corresponding PR in the `pytorch/xla` repo that uses these changes [here](pytorch/xla#2869).
After checking out pytorch/xla from source, you can run `python ../tools/codegen/gen_backend_stubs.py  --output_dir="torch_xla/csrc/" --source_yaml="xla_native_functions.yaml"`. It spits 3 files, which are close-to-byte-for-byte-identical to what pytorch/xla currently has:
* `torch_xla/csrc/aten_xla_type.h`
* `torch_xla/csrc/aten_xla_type_default.h`
* `torch_xla/csrc/aten_xla_type_default.cpp`

`aten_xla_type.h` is currently maintained manually in the pytorch/xla repo; instead, a new `xla_native_functions.yaml` file will maintain the list of ops that xla supports, and the codegen will generated the header file. This is more similar to what our other in-tree codegen does. Unfortunately I couldn't keep it byte-for-byte identically because the original order of the signatures was arbitrary, but the file contains the same set of declarations with the same signatures.


**Byte-for-byte changes**

`aten_xla_type_default.h` contains the headers for functions that fall back to CPU, same as before (identical sans comments and whitespace, and `Stream` -> `at::Stream` [they had a `using c10::Stream` somewhere]).

`aten_xla_type_default.cpp` contains a few things:
* kernels that fall back to CPU for any non-composite ops that xla doesn't support
* code-generated out kernels for a few operators, which just call into the xla-implemented kernel
* dispatcher registrations for all relevant kernels (xla kernels, codegen'd out wrappers, codegen'd fallbacks)

That file is byte-for-byte identical with a few differences:
* namespaces: `c10::Device` ->`at::Device`, and `Stream` -> `at::Stream`
* The other diffs are smaller and easy to look over, I pasted them here: P408712288
  * We no longer manually unpack `TensorOptions` in the CPU fallbacks, and instead use the faithful C++ API
  * The codegen'd CPU fallback for `miopen_rnn_backward` had a small bug that this codegen fixes: it wasn't converting all output tensor arguments from CPU back to XLA properly.


**Data model changes**
* New `ExternalBackendMetadata`, `ExternalBackendFunction`, and `ExternalBackendFunctionsGroup` classes. `ExternalBackendMetadata` represents a line from `xla_native_functions.yaml`, and `ExternalBackendFunction/ExternalBackendFunctionsGroup` are data representations that link the metadata to the corresponding native functions.


**NOTE: What is staying/leaving after this PR**
The bulk of the changes live in `gen_external_fallbacks.py`. It contains:
* declarations/definitions for CPU fallbacks
* code-generated `out` wrappers (same as what XLA does)
* all dispatcher registrations

All of the CPU fallback logic will hopefully be killed in the near future, when I write I replace it with a boxed fallback. I opted to put it in a new file (`gen_external_fallbacks.py`) because (a) most of it will disappear soon, making it easier to disentangle later, and (b) it made merge conflicts (slightly) easier to deal with while working on this PR :)

When that logic is killed, I'm planning to move the remaining logic (`codegen'd out wrappers + dispatcher registrations) into `register_dispatch_key.py`, since the logic will eventually look a lot more similar.




Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346)

[ghstack-poisoned]
bdhirsh added a commit to pytorch/pytorch that referenced this pull request Apr 20, 2021
This PR ports the codegen logic from the [XLA repo](https://github.com/pytorch/xla/blob/master/scripts/gen.py) to be in-tree.

This particular PR maintains (almost) byte-for-byte compatibility with XLA's codegen, with the differences highlighted below. That means that there's some xla-specific stuff in the codegen, which will be removed in subsequent PR's.

**Plan**
[] (this PR) merge byte-for-byte version of XLA codegen in-tree
[] Get XLA to use the in-tree codegen
[] Remove the xla-specific `bridge::` API from the codegen
[] Kill the CPU fallback codegen in favor of a boxed kernel
[] Remove other xla-specific bits, like the file names (`aten_xla_type.h`)
[] update codegen to properly handle generating `inplace`/`out`
[] update codegen to handle structured

**New codegen workflow**
I have a corresponding PR in the `pytorch/xla` repo that uses these changes [here](pytorch/xla#2869).
After checking out pytorch/xla from source, you can run `python ../tools/codegen/gen_backend_stubs.py  --output_dir="torch_xla/csrc/" --source_yaml="xla_native_functions.yaml"`. It spits 3 files, which are close-to-byte-for-byte-identical to what pytorch/xla currently has:
* `torch_xla/csrc/aten_xla_type.h`
* `torch_xla/csrc/aten_xla_type_default.h`
* `torch_xla/csrc/aten_xla_type_default.cpp`

`aten_xla_type.h` is currently maintained manually in the pytorch/xla repo; instead, a new `xla_native_functions.yaml` file will maintain the list of ops that xla supports, and the codegen will generated the header file. This is more similar to what our other in-tree codegen does. Unfortunately I couldn't keep it byte-for-byte identically because the original order of the signatures was arbitrary, but the file contains the same set of declarations with the same signatures.


**Byte-for-byte changes**

`aten_xla_type_default.h` contains the headers for functions that fall back to CPU, same as before (identical sans comments and whitespace, and `Stream` -> `at::Stream` [they had a `using c10::Stream` somewhere]).

`aten_xla_type_default.cpp` contains a few things:
* kernels that fall back to CPU for any non-composite ops that xla doesn't support
* code-generated out kernels for a few operators, which just call into the xla-implemented kernel
* dispatcher registrations for all relevant kernels (xla kernels, codegen'd out wrappers, codegen'd fallbacks)

That file is byte-for-byte identical with a few differences:
* namespaces: `c10::Device` ->`at::Device`, and `Stream` -> `at::Stream`
* The other diffs are smaller and easy to look over, I pasted them here: P408712288
  * We no longer manually unpack `TensorOptions` in the CPU fallbacks, and instead use the faithful C++ API
  * The codegen'd CPU fallback for `miopen_rnn_backward` had a small bug that this codegen fixes: it wasn't converting all output tensor arguments from CPU back to XLA properly.


**Data model changes**
* New `ExternalBackendMetadata`, `ExternalBackendFunction`, and `ExternalBackendFunctionsGroup` classes. `ExternalBackendMetadata` represents a line from `xla_native_functions.yaml`, and `ExternalBackendFunction/ExternalBackendFunctionsGroup` are data representations that link the metadata to the corresponding native functions.


**NOTE: What is staying/leaving after this PR**
The bulk of the changes live in `gen_external_fallbacks.py`. It contains:
* declarations/definitions for CPU fallbacks
* code-generated `out` wrappers (same as what XLA does)
* all dispatcher registrations

All of the CPU fallback logic will hopefully be killed in the near future, when I write I replace it with a boxed fallback. I opted to put it in a new file (`gen_external_fallbacks.py`) because (a) most of it will disappear soon, making it easier to disentangle later, and (b) it made merge conflicts (slightly) easier to deal with while working on this PR :)

When that logic is killed, I'm planning to move the remaining logic (`codegen'd out wrappers + dispatcher registrations) into `register_dispatch_key.py`, since the logic will eventually look a lot more similar.




Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346)

[ghstack-poisoned]
bdhirsh added a commit to pytorch/pytorch that referenced this pull request Apr 21, 2021
This PR ports the codegen logic from the [XLA repo](https://github.com/pytorch/xla/blob/master/scripts/gen.py) to be in-tree.

This particular PR maintains (almost) byte-for-byte compatibility with XLA's codegen, with the differences highlighted below. That means that there's some xla-specific stuff in the codegen, which will be removed in subsequent PR's.

**Plan**
[] (this PR) merge byte-for-byte version of XLA codegen in-tree
[] Get XLA to use the in-tree codegen
[] Remove the xla-specific `bridge::` API from the codegen
[] Kill the CPU fallback codegen in favor of a boxed kernel
[] Remove other xla-specific bits, like the file names (`aten_xla_type.h`)
[] update codegen to properly handle generating `inplace`/`out`
[] update codegen to handle structured

**New codegen workflow**
I have a corresponding PR in the `pytorch/xla` repo that uses these changes [here](pytorch/xla#2869).
After checking out pytorch/xla from source, you can run `python ../tools/codegen/gen_backend_stubs.py  --output_dir="torch_xla/csrc/" --source_yaml="xla_native_functions.yaml"`. It spits 3 files, which are close-to-byte-for-byte-identical to what pytorch/xla currently has:
* `torch_xla/csrc/aten_xla_type.h`
* `torch_xla/csrc/aten_xla_type_default.h`
* `torch_xla/csrc/aten_xla_type_default.cpp`

`aten_xla_type.h` is currently maintained manually in the pytorch/xla repo; instead, a new `xla_native_functions.yaml` file will maintain the list of ops that xla supports, and the codegen will generated the header file. This is more similar to what our other in-tree codegen does. Unfortunately I couldn't keep it byte-for-byte identically because the original order of the signatures was arbitrary, but the file contains the same set of declarations with the same signatures.


**Byte-for-byte changes**

`aten_xla_type_default.h` contains the headers for functions that fall back to CPU, same as before (identical sans comments and whitespace, and `Stream` -> `at::Stream` [they had a `using c10::Stream` somewhere]).

`aten_xla_type_default.cpp` contains a few things:
* kernels that fall back to CPU for any non-composite ops that xla doesn't support
* code-generated out kernels for a few operators, which just call into the xla-implemented kernel
* dispatcher registrations for all relevant kernels (xla kernels, codegen'd out wrappers, codegen'd fallbacks)

That file is byte-for-byte identical with a few differences:
* namespaces: `c10::Device` ->`at::Device`, and `Stream` -> `at::Stream`
* The other diffs are smaller and easy to look over, I pasted them here: P408712288
  * We no longer manually unpack `TensorOptions` in the CPU fallbacks, and instead use the faithful C++ API
  * The codegen'd CPU fallback for `miopen_rnn_backward` had a small bug that this codegen fixes: it wasn't converting all output tensor arguments from CPU back to XLA properly.


**Data model changes**
* New `ExternalBackendMetadata`, `ExternalBackendFunction`, and `ExternalBackendFunctionsGroup` classes. `ExternalBackendMetadata` represents a line from `xla_native_functions.yaml`, and `ExternalBackendFunction/ExternalBackendFunctionsGroup` are data representations that link the metadata to the corresponding native functions.


**NOTE: What is staying/leaving after this PR**
The bulk of the changes live in `gen_external_fallbacks.py`. It contains:
* declarations/definitions for CPU fallbacks
* code-generated `out` wrappers (same as what XLA does)
* all dispatcher registrations

All of the CPU fallback logic will hopefully be killed in the near future, when I write I replace it with a boxed fallback. I opted to put it in a new file (`gen_external_fallbacks.py`) because (a) most of it will disappear soon, making it easier to disentangle later, and (b) it made merge conflicts (slightly) easier to deal with while working on this PR :)

When that logic is killed, I'm planning to move the remaining logic (`codegen'd out wrappers + dispatcher registrations) into `register_dispatch_key.py`, since the logic will eventually look a lot more similar.




Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346)

[ghstack-poisoned]
bdhirsh added a commit to pytorch/pytorch that referenced this pull request Apr 21, 2021
This PR ports the codegen logic from the [XLA repo](https://github.com/pytorch/xla/blob/master/scripts/gen.py) to be in-tree.

This particular PR maintains (almost) byte-for-byte compatibility with XLA's codegen, with the differences highlighted below. That means that there's some xla-specific stuff in the codegen, which will be removed in subsequent PR's.

**Plan**
[] (this PR) merge byte-for-byte version of XLA codegen in-tree
[] Get XLA to use the in-tree codegen
[] Remove the xla-specific `bridge::` API from the codegen
[] Kill the CPU fallback codegen in favor of a boxed kernel
[] Remove other xla-specific bits, like the file names (`aten_xla_type.h`)
[] update codegen to properly handle generating `inplace`/`out`
[] update codegen to handle structured

**New codegen workflow**
I have a corresponding PR in the `pytorch/xla` repo that uses these changes [here](pytorch/xla#2869).
After checking out pytorch/xla from source, you can run `python ../tools/codegen/gen_backend_stubs.py  --output_dir="torch_xla/csrc/" --source_yaml="xla_native_functions.yaml"`. It spits 3 files, which are close-to-byte-for-byte-identical to what pytorch/xla currently has:
* `torch_xla/csrc/aten_xla_type.h`
* `torch_xla/csrc/aten_xla_type_default.h`
* `torch_xla/csrc/aten_xla_type_default.cpp`

`aten_xla_type.h` is currently maintained manually in the pytorch/xla repo; instead, a new `xla_native_functions.yaml` file will maintain the list of ops that xla supports, and the codegen will generated the header file. This is more similar to what our other in-tree codegen does. Unfortunately I couldn't keep it byte-for-byte identically because the original order of the signatures was arbitrary, but the file contains the same set of declarations with the same signatures.


**Byte-for-byte changes**

`aten_xla_type_default.h` contains the headers for functions that fall back to CPU, same as before (identical sans comments and whitespace, and `Stream` -> `at::Stream` [they had a `using c10::Stream` somewhere]).

`aten_xla_type_default.cpp` contains a few things:
* kernels that fall back to CPU for any non-composite ops that xla doesn't support
* code-generated out kernels for a few operators, which just call into the xla-implemented kernel
* dispatcher registrations for all relevant kernels (xla kernels, codegen'd out wrappers, codegen'd fallbacks)

That file is byte-for-byte identical with a few differences:
* namespaces: `c10::Device` ->`at::Device`, and `Stream` -> `at::Stream`
* The other diffs are smaller and easy to look over, I pasted them here: P408712288
  * We no longer manually unpack `TensorOptions` in the CPU fallbacks, and instead use the faithful C++ API
  * The codegen'd CPU fallback for `miopen_rnn_backward` had a small bug that this codegen fixes: it wasn't converting all output tensor arguments from CPU back to XLA properly.


**Data model changes**
* New `ExternalBackendMetadata`, `ExternalBackendFunction`, and `ExternalBackendFunctionsGroup` classes. `ExternalBackendMetadata` represents a line from `xla_native_functions.yaml`, and `ExternalBackendFunction/ExternalBackendFunctionsGroup` are data representations that link the metadata to the corresponding native functions.


**NOTE: What is staying/leaving after this PR**
The bulk of the changes live in `gen_external_fallbacks.py`. It contains:
* declarations/definitions for CPU fallbacks
* code-generated `out` wrappers (same as what XLA does)
* all dispatcher registrations

All of the CPU fallback logic will hopefully be killed in the near future, when I write I replace it with a boxed fallback. I opted to put it in a new file (`gen_external_fallbacks.py`) because (a) most of it will disappear soon, making it easier to disentangle later, and (b) it made merge conflicts (slightly) easier to deal with while working on this PR :)

When that logic is killed, I'm planning to move the remaining logic (`codegen'd out wrappers + dispatcher registrations) into `register_dispatch_key.py`, since the logic will eventually look a lot more similar.




Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346)

[ghstack-poisoned]
@JackCaoG
Copy link
Copy Markdown
Collaborator

JackCaoG commented Oct 1, 2021

@bdhirsh Do we still need this pr?

@bdhirsh
Copy link
Copy Markdown
Contributor Author

bdhirsh commented Oct 1, 2021

this is old and can definitely be closed 😛

I had this open when I was experimenting integrating the public codegen API with XLA

@bdhirsh bdhirsh closed this Oct 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants