Closed
Conversation
3348b02 to
75bebab
Compare
bdhirsh
added a commit
to pytorch/pytorch
that referenced
this pull request
Apr 16, 2021
…n-tree" This PR ports the codegen logic from the [XLA repo](https://github.com/pytorch/xla/blob/master/scripts/gen.py) to be in-tree. This particular PR maintains (almost) byte-for-byte compatibility with XLA's codegen, with the differences highlighted below. That means that there's some xla-specific stuff in the codegen, which will be removed in subsequent PR's. **Plan** [] (this PR) merge byte-for-byte version of XLA codegen in-tree [] Get XLA to use the in-tree codegen [] Remove the xla-specific `bridge::` API from the codegen [] Kill the CPU fallback codegen in favor of a boxed kernel [] Remove other xla-specific bits, like the file names (`aten_xla_type.h`) [] update codegen to properly handle generating `inplace`/`out` [] update codegen to handle structured **New codegen workflow** I have a corresponding PR in the `pytorch/xla` repo that uses these changes [here](pytorch/xla#2869). After checking out pytorch/xla from source, you can run `python ../tools/codegen/gen_backend_stubs.py --output_dir="torch_xla/csrc/" --source_yaml="xla_native_functions.yaml"`. It spits 3 files, which are close-to-byte-for-byte-identical to what pytorch/xla currently has: * `torch_xla/csrc/aten_xla_type.h` * `torch_xla/csrc/aten_xla_type_default.h` * `torch_xla/csrc/aten_xla_type_default.cpp` `aten_xla_type.h` is currently maintained manually in the pytorch/xla repo; instead, a new `xla_native_functions.yaml` file will maintain the list of ops that xla supports, and the codegen will generated the header file. This is more similar to what our other in-tree codegen does. Unfortunately I couldn't keep it byte-for-byte identically because the original order of the signatures was arbitrary, but the file contains the same set of declarations with the same signatures. **Byte-for-byte changes** `aten_xla_type_default.h` contains the headers for functions that fall back to CPU, same as before (identical sans comments and whitespace, and `Stream` -> `at::Stream` [they had a `using c10::Stream` somewhere]). `aten_xla_type_default.cpp` contains a few things: * kernels that fall back to CPU for any non-composite ops that xla doesn't support * code-generated out kernels for a few operators, which just call into the xla-implemented kernel * dispatcher registrations for all relevant kernels (xla kernels, codegen'd out wrappers, codegen'd fallbacks) That file is byte-for-byte identical with a few differences: * namespaces: `c10::Device` ->`at::Device`, and `Stream` -> `at::Stream` * The other diffs are smaller and easy to look over, I pasted them here: P408712288 * We no longer manually unpack `TensorOptions` in the CPU fallbacks, and instead use the faithful C++ API * The codegen'd CPU fallback for `miopen_rnn_backward` had a small bug that this codegen fixes: it wasn't converting all output tensor arguments from CPU back to XLA properly. **Data model changes** * New `ExternalBackendMetadata`, `ExternalBackendFunction`, and `ExternalBackendFunctionsGroup` classes. `ExternalBackendMetadata` represents a line from `xla_native_functions.yaml`, and `ExternalBackendFunction/ExternalBackendFunctionsGroup` are data representations that link the metadata to the corresponding native functions. **NOTE: What is staying/leaving after this PR** The bulk of the changes live in `gen_external_fallbacks.py`. It contains: * declarations/definitions for CPU fallbacks * code-generated `out` wrappers (same as what XLA does) * all dispatcher registrations All of the CPU fallback logic will hopefully be killed in the near future, when I write I replace it with a boxed fallback. I opted to put it in a new file (`gen_external_fallbacks.py`) because (a) most of it will disappear soon, making it easier to disentangle later, and (b) it made merge conflicts (slightly) easier to deal with while working on this PR :) When that logic is killed, I'm planning to move the remaining logic (`codegen'd out wrappers + dispatcher registrations) into `register_dispatch_key.py`, since the logic will eventually look a lot more similar. Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346) [ghstack-poisoned]
bdhirsh
added a commit
to pytorch/pytorch
that referenced
this pull request
Apr 16, 2021
This PR ports the codegen logic from the [XLA repo](https://github.com/pytorch/xla/blob/master/scripts/gen.py) to be in-tree. This particular PR maintains (almost) byte-for-byte compatibility with XLA's codegen, with the differences highlighted below. That means that there's some xla-specific stuff in the codegen, which will be removed in subsequent PR's. **Plan** [] (this PR) merge byte-for-byte version of XLA codegen in-tree [] Get XLA to use the in-tree codegen [] Remove the xla-specific `bridge::` API from the codegen [] Kill the CPU fallback codegen in favor of a boxed kernel [] Remove other xla-specific bits, like the file names (`aten_xla_type.h`) [] update codegen to properly handle generating `inplace`/`out` [] update codegen to handle structured **New codegen workflow** I have a corresponding PR in the `pytorch/xla` repo that uses these changes [here](pytorch/xla#2869). After checking out pytorch/xla from source, you can run `python ../tools/codegen/gen_backend_stubs.py --output_dir="torch_xla/csrc/" --source_yaml="xla_native_functions.yaml"`. It spits 3 files, which are close-to-byte-for-byte-identical to what pytorch/xla currently has: * `torch_xla/csrc/aten_xla_type.h` * `torch_xla/csrc/aten_xla_type_default.h` * `torch_xla/csrc/aten_xla_type_default.cpp` `aten_xla_type.h` is currently maintained manually in the pytorch/xla repo; instead, a new `xla_native_functions.yaml` file will maintain the list of ops that xla supports, and the codegen will generated the header file. This is more similar to what our other in-tree codegen does. Unfortunately I couldn't keep it byte-for-byte identically because the original order of the signatures was arbitrary, but the file contains the same set of declarations with the same signatures. **Byte-for-byte changes** `aten_xla_type_default.h` contains the headers for functions that fall back to CPU, same as before (identical sans comments and whitespace, and `Stream` -> `at::Stream` [they had a `using c10::Stream` somewhere]). `aten_xla_type_default.cpp` contains a few things: * kernels that fall back to CPU for any non-composite ops that xla doesn't support * code-generated out kernels for a few operators, which just call into the xla-implemented kernel * dispatcher registrations for all relevant kernels (xla kernels, codegen'd out wrappers, codegen'd fallbacks) That file is byte-for-byte identical with a few differences: * namespaces: `c10::Device` ->`at::Device`, and `Stream` -> `at::Stream` * The other diffs are smaller and easy to look over, I pasted them here: P408712288 * We no longer manually unpack `TensorOptions` in the CPU fallbacks, and instead use the faithful C++ API * The codegen'd CPU fallback for `miopen_rnn_backward` had a small bug that this codegen fixes: it wasn't converting all output tensor arguments from CPU back to XLA properly. **Data model changes** * New `ExternalBackendMetadata`, `ExternalBackendFunction`, and `ExternalBackendFunctionsGroup` classes. `ExternalBackendMetadata` represents a line from `xla_native_functions.yaml`, and `ExternalBackendFunction/ExternalBackendFunctionsGroup` are data representations that link the metadata to the corresponding native functions. **NOTE: What is staying/leaving after this PR** The bulk of the changes live in `gen_external_fallbacks.py`. It contains: * declarations/definitions for CPU fallbacks * code-generated `out` wrappers (same as what XLA does) * all dispatcher registrations All of the CPU fallback logic will hopefully be killed in the near future, when I write I replace it with a boxed fallback. I opted to put it in a new file (`gen_external_fallbacks.py`) because (a) most of it will disappear soon, making it easier to disentangle later, and (b) it made merge conflicts (slightly) easier to deal with while working on this PR :) When that logic is killed, I'm planning to move the remaining logic (`codegen'd out wrappers + dispatcher registrations) into `register_dispatch_key.py`, since the logic will eventually look a lot more similar. Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346) [ghstack-poisoned]
bdhirsh
added a commit
to pytorch/pytorch
that referenced
this pull request
Apr 20, 2021
This PR ports the codegen logic from the [XLA repo](https://github.com/pytorch/xla/blob/master/scripts/gen.py) to be in-tree. This particular PR maintains (almost) byte-for-byte compatibility with XLA's codegen, with the differences highlighted below. That means that there's some xla-specific stuff in the codegen, which will be removed in subsequent PR's. **Plan** [] (this PR) merge byte-for-byte version of XLA codegen in-tree [] Get XLA to use the in-tree codegen [] Remove the xla-specific `bridge::` API from the codegen [] Kill the CPU fallback codegen in favor of a boxed kernel [] Remove other xla-specific bits, like the file names (`aten_xla_type.h`) [] update codegen to properly handle generating `inplace`/`out` [] update codegen to handle structured **New codegen workflow** I have a corresponding PR in the `pytorch/xla` repo that uses these changes [here](pytorch/xla#2869). After checking out pytorch/xla from source, you can run `python ../tools/codegen/gen_backend_stubs.py --output_dir="torch_xla/csrc/" --source_yaml="xla_native_functions.yaml"`. It spits 3 files, which are close-to-byte-for-byte-identical to what pytorch/xla currently has: * `torch_xla/csrc/aten_xla_type.h` * `torch_xla/csrc/aten_xla_type_default.h` * `torch_xla/csrc/aten_xla_type_default.cpp` `aten_xla_type.h` is currently maintained manually in the pytorch/xla repo; instead, a new `xla_native_functions.yaml` file will maintain the list of ops that xla supports, and the codegen will generated the header file. This is more similar to what our other in-tree codegen does. Unfortunately I couldn't keep it byte-for-byte identically because the original order of the signatures was arbitrary, but the file contains the same set of declarations with the same signatures. **Byte-for-byte changes** `aten_xla_type_default.h` contains the headers for functions that fall back to CPU, same as before (identical sans comments and whitespace, and `Stream` -> `at::Stream` [they had a `using c10::Stream` somewhere]). `aten_xla_type_default.cpp` contains a few things: * kernels that fall back to CPU for any non-composite ops that xla doesn't support * code-generated out kernels for a few operators, which just call into the xla-implemented kernel * dispatcher registrations for all relevant kernels (xla kernels, codegen'd out wrappers, codegen'd fallbacks) That file is byte-for-byte identical with a few differences: * namespaces: `c10::Device` ->`at::Device`, and `Stream` -> `at::Stream` * The other diffs are smaller and easy to look over, I pasted them here: P408712288 * We no longer manually unpack `TensorOptions` in the CPU fallbacks, and instead use the faithful C++ API * The codegen'd CPU fallback for `miopen_rnn_backward` had a small bug that this codegen fixes: it wasn't converting all output tensor arguments from CPU back to XLA properly. **Data model changes** * New `ExternalBackendMetadata`, `ExternalBackendFunction`, and `ExternalBackendFunctionsGroup` classes. `ExternalBackendMetadata` represents a line from `xla_native_functions.yaml`, and `ExternalBackendFunction/ExternalBackendFunctionsGroup` are data representations that link the metadata to the corresponding native functions. **NOTE: What is staying/leaving after this PR** The bulk of the changes live in `gen_external_fallbacks.py`. It contains: * declarations/definitions for CPU fallbacks * code-generated `out` wrappers (same as what XLA does) * all dispatcher registrations All of the CPU fallback logic will hopefully be killed in the near future, when I write I replace it with a boxed fallback. I opted to put it in a new file (`gen_external_fallbacks.py`) because (a) most of it will disappear soon, making it easier to disentangle later, and (b) it made merge conflicts (slightly) easier to deal with while working on this PR :) When that logic is killed, I'm planning to move the remaining logic (`codegen'd out wrappers + dispatcher registrations) into `register_dispatch_key.py`, since the logic will eventually look a lot more similar. Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346) [ghstack-poisoned]
bdhirsh
added a commit
to pytorch/pytorch
that referenced
this pull request
Apr 20, 2021
This PR ports the codegen logic from the [XLA repo](https://github.com/pytorch/xla/blob/master/scripts/gen.py) to be in-tree. This particular PR maintains (almost) byte-for-byte compatibility with XLA's codegen, with the differences highlighted below. That means that there's some xla-specific stuff in the codegen, which will be removed in subsequent PR's. **Plan** [] (this PR) merge byte-for-byte version of XLA codegen in-tree [] Get XLA to use the in-tree codegen [] Remove the xla-specific `bridge::` API from the codegen [] Kill the CPU fallback codegen in favor of a boxed kernel [] Remove other xla-specific bits, like the file names (`aten_xla_type.h`) [] update codegen to properly handle generating `inplace`/`out` [] update codegen to handle structured **New codegen workflow** I have a corresponding PR in the `pytorch/xla` repo that uses these changes [here](pytorch/xla#2869). After checking out pytorch/xla from source, you can run `python ../tools/codegen/gen_backend_stubs.py --output_dir="torch_xla/csrc/" --source_yaml="xla_native_functions.yaml"`. It spits 3 files, which are close-to-byte-for-byte-identical to what pytorch/xla currently has: * `torch_xla/csrc/aten_xla_type.h` * `torch_xla/csrc/aten_xla_type_default.h` * `torch_xla/csrc/aten_xla_type_default.cpp` `aten_xla_type.h` is currently maintained manually in the pytorch/xla repo; instead, a new `xla_native_functions.yaml` file will maintain the list of ops that xla supports, and the codegen will generated the header file. This is more similar to what our other in-tree codegen does. Unfortunately I couldn't keep it byte-for-byte identically because the original order of the signatures was arbitrary, but the file contains the same set of declarations with the same signatures. **Byte-for-byte changes** `aten_xla_type_default.h` contains the headers for functions that fall back to CPU, same as before (identical sans comments and whitespace, and `Stream` -> `at::Stream` [they had a `using c10::Stream` somewhere]). `aten_xla_type_default.cpp` contains a few things: * kernels that fall back to CPU for any non-composite ops that xla doesn't support * code-generated out kernels for a few operators, which just call into the xla-implemented kernel * dispatcher registrations for all relevant kernels (xla kernels, codegen'd out wrappers, codegen'd fallbacks) That file is byte-for-byte identical with a few differences: * namespaces: `c10::Device` ->`at::Device`, and `Stream` -> `at::Stream` * The other diffs are smaller and easy to look over, I pasted them here: P408712288 * We no longer manually unpack `TensorOptions` in the CPU fallbacks, and instead use the faithful C++ API * The codegen'd CPU fallback for `miopen_rnn_backward` had a small bug that this codegen fixes: it wasn't converting all output tensor arguments from CPU back to XLA properly. **Data model changes** * New `ExternalBackendMetadata`, `ExternalBackendFunction`, and `ExternalBackendFunctionsGroup` classes. `ExternalBackendMetadata` represents a line from `xla_native_functions.yaml`, and `ExternalBackendFunction/ExternalBackendFunctionsGroup` are data representations that link the metadata to the corresponding native functions. **NOTE: What is staying/leaving after this PR** The bulk of the changes live in `gen_external_fallbacks.py`. It contains: * declarations/definitions for CPU fallbacks * code-generated `out` wrappers (same as what XLA does) * all dispatcher registrations All of the CPU fallback logic will hopefully be killed in the near future, when I write I replace it with a boxed fallback. I opted to put it in a new file (`gen_external_fallbacks.py`) because (a) most of it will disappear soon, making it easier to disentangle later, and (b) it made merge conflicts (slightly) easier to deal with while working on this PR :) When that logic is killed, I'm planning to move the remaining logic (`codegen'd out wrappers + dispatcher registrations) into `register_dispatch_key.py`, since the logic will eventually look a lot more similar. Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346) [ghstack-poisoned]
bdhirsh
added a commit
to pytorch/pytorch
that referenced
this pull request
Apr 20, 2021
This PR ports the codegen logic from the [XLA repo](https://github.com/pytorch/xla/blob/master/scripts/gen.py) to be in-tree. This particular PR maintains (almost) byte-for-byte compatibility with XLA's codegen, with the differences highlighted below. That means that there's some xla-specific stuff in the codegen, which will be removed in subsequent PR's. **Plan** [] (this PR) merge byte-for-byte version of XLA codegen in-tree [] Get XLA to use the in-tree codegen [] Remove the xla-specific `bridge::` API from the codegen [] Kill the CPU fallback codegen in favor of a boxed kernel [] Remove other xla-specific bits, like the file names (`aten_xla_type.h`) [] update codegen to properly handle generating `inplace`/`out` [] update codegen to handle structured **New codegen workflow** I have a corresponding PR in the `pytorch/xla` repo that uses these changes [here](pytorch/xla#2869). After checking out pytorch/xla from source, you can run `python ../tools/codegen/gen_backend_stubs.py --output_dir="torch_xla/csrc/" --source_yaml="xla_native_functions.yaml"`. It spits 3 files, which are close-to-byte-for-byte-identical to what pytorch/xla currently has: * `torch_xla/csrc/aten_xla_type.h` * `torch_xla/csrc/aten_xla_type_default.h` * `torch_xla/csrc/aten_xla_type_default.cpp` `aten_xla_type.h` is currently maintained manually in the pytorch/xla repo; instead, a new `xla_native_functions.yaml` file will maintain the list of ops that xla supports, and the codegen will generated the header file. This is more similar to what our other in-tree codegen does. Unfortunately I couldn't keep it byte-for-byte identically because the original order of the signatures was arbitrary, but the file contains the same set of declarations with the same signatures. **Byte-for-byte changes** `aten_xla_type_default.h` contains the headers for functions that fall back to CPU, same as before (identical sans comments and whitespace, and `Stream` -> `at::Stream` [they had a `using c10::Stream` somewhere]). `aten_xla_type_default.cpp` contains a few things: * kernels that fall back to CPU for any non-composite ops that xla doesn't support * code-generated out kernels for a few operators, which just call into the xla-implemented kernel * dispatcher registrations for all relevant kernels (xla kernels, codegen'd out wrappers, codegen'd fallbacks) That file is byte-for-byte identical with a few differences: * namespaces: `c10::Device` ->`at::Device`, and `Stream` -> `at::Stream` * The other diffs are smaller and easy to look over, I pasted them here: P408712288 * We no longer manually unpack `TensorOptions` in the CPU fallbacks, and instead use the faithful C++ API * The codegen'd CPU fallback for `miopen_rnn_backward` had a small bug that this codegen fixes: it wasn't converting all output tensor arguments from CPU back to XLA properly. **Data model changes** * New `ExternalBackendMetadata`, `ExternalBackendFunction`, and `ExternalBackendFunctionsGroup` classes. `ExternalBackendMetadata` represents a line from `xla_native_functions.yaml`, and `ExternalBackendFunction/ExternalBackendFunctionsGroup` are data representations that link the metadata to the corresponding native functions. **NOTE: What is staying/leaving after this PR** The bulk of the changes live in `gen_external_fallbacks.py`. It contains: * declarations/definitions for CPU fallbacks * code-generated `out` wrappers (same as what XLA does) * all dispatcher registrations All of the CPU fallback logic will hopefully be killed in the near future, when I write I replace it with a boxed fallback. I opted to put it in a new file (`gen_external_fallbacks.py`) because (a) most of it will disappear soon, making it easier to disentangle later, and (b) it made merge conflicts (slightly) easier to deal with while working on this PR :) When that logic is killed, I'm planning to move the remaining logic (`codegen'd out wrappers + dispatcher registrations) into `register_dispatch_key.py`, since the logic will eventually look a lot more similar. Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346) [ghstack-poisoned]
bdhirsh
added a commit
to pytorch/pytorch
that referenced
this pull request
Apr 20, 2021
This PR ports the codegen logic from the [XLA repo](https://github.com/pytorch/xla/blob/master/scripts/gen.py) to be in-tree. This particular PR maintains (almost) byte-for-byte compatibility with XLA's codegen, with the differences highlighted below. That means that there's some xla-specific stuff in the codegen, which will be removed in subsequent PR's. **Plan** [] (this PR) merge byte-for-byte version of XLA codegen in-tree [] Get XLA to use the in-tree codegen [] Remove the xla-specific `bridge::` API from the codegen [] Kill the CPU fallback codegen in favor of a boxed kernel [] Remove other xla-specific bits, like the file names (`aten_xla_type.h`) [] update codegen to properly handle generating `inplace`/`out` [] update codegen to handle structured **New codegen workflow** I have a corresponding PR in the `pytorch/xla` repo that uses these changes [here](pytorch/xla#2869). After checking out pytorch/xla from source, you can run `python ../tools/codegen/gen_backend_stubs.py --output_dir="torch_xla/csrc/" --source_yaml="xla_native_functions.yaml"`. It spits 3 files, which are close-to-byte-for-byte-identical to what pytorch/xla currently has: * `torch_xla/csrc/aten_xla_type.h` * `torch_xla/csrc/aten_xla_type_default.h` * `torch_xla/csrc/aten_xla_type_default.cpp` `aten_xla_type.h` is currently maintained manually in the pytorch/xla repo; instead, a new `xla_native_functions.yaml` file will maintain the list of ops that xla supports, and the codegen will generated the header file. This is more similar to what our other in-tree codegen does. Unfortunately I couldn't keep it byte-for-byte identically because the original order of the signatures was arbitrary, but the file contains the same set of declarations with the same signatures. **Byte-for-byte changes** `aten_xla_type_default.h` contains the headers for functions that fall back to CPU, same as before (identical sans comments and whitespace, and `Stream` -> `at::Stream` [they had a `using c10::Stream` somewhere]). `aten_xla_type_default.cpp` contains a few things: * kernels that fall back to CPU for any non-composite ops that xla doesn't support * code-generated out kernels for a few operators, which just call into the xla-implemented kernel * dispatcher registrations for all relevant kernels (xla kernels, codegen'd out wrappers, codegen'd fallbacks) That file is byte-for-byte identical with a few differences: * namespaces: `c10::Device` ->`at::Device`, and `Stream` -> `at::Stream` * The other diffs are smaller and easy to look over, I pasted them here: P408712288 * We no longer manually unpack `TensorOptions` in the CPU fallbacks, and instead use the faithful C++ API * The codegen'd CPU fallback for `miopen_rnn_backward` had a small bug that this codegen fixes: it wasn't converting all output tensor arguments from CPU back to XLA properly. **Data model changes** * New `ExternalBackendMetadata`, `ExternalBackendFunction`, and `ExternalBackendFunctionsGroup` classes. `ExternalBackendMetadata` represents a line from `xla_native_functions.yaml`, and `ExternalBackendFunction/ExternalBackendFunctionsGroup` are data representations that link the metadata to the corresponding native functions. **NOTE: What is staying/leaving after this PR** The bulk of the changes live in `gen_external_fallbacks.py`. It contains: * declarations/definitions for CPU fallbacks * code-generated `out` wrappers (same as what XLA does) * all dispatcher registrations All of the CPU fallback logic will hopefully be killed in the near future, when I write I replace it with a boxed fallback. I opted to put it in a new file (`gen_external_fallbacks.py`) because (a) most of it will disappear soon, making it easier to disentangle later, and (b) it made merge conflicts (slightly) easier to deal with while working on this PR :) When that logic is killed, I'm planning to move the remaining logic (`codegen'd out wrappers + dispatcher registrations) into `register_dispatch_key.py`, since the logic will eventually look a lot more similar. Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346) [ghstack-poisoned]
bdhirsh
added a commit
to pytorch/pytorch
that referenced
this pull request
Apr 21, 2021
This PR ports the codegen logic from the [XLA repo](https://github.com/pytorch/xla/blob/master/scripts/gen.py) to be in-tree. This particular PR maintains (almost) byte-for-byte compatibility with XLA's codegen, with the differences highlighted below. That means that there's some xla-specific stuff in the codegen, which will be removed in subsequent PR's. **Plan** [] (this PR) merge byte-for-byte version of XLA codegen in-tree [] Get XLA to use the in-tree codegen [] Remove the xla-specific `bridge::` API from the codegen [] Kill the CPU fallback codegen in favor of a boxed kernel [] Remove other xla-specific bits, like the file names (`aten_xla_type.h`) [] update codegen to properly handle generating `inplace`/`out` [] update codegen to handle structured **New codegen workflow** I have a corresponding PR in the `pytorch/xla` repo that uses these changes [here](pytorch/xla#2869). After checking out pytorch/xla from source, you can run `python ../tools/codegen/gen_backend_stubs.py --output_dir="torch_xla/csrc/" --source_yaml="xla_native_functions.yaml"`. It spits 3 files, which are close-to-byte-for-byte-identical to what pytorch/xla currently has: * `torch_xla/csrc/aten_xla_type.h` * `torch_xla/csrc/aten_xla_type_default.h` * `torch_xla/csrc/aten_xla_type_default.cpp` `aten_xla_type.h` is currently maintained manually in the pytorch/xla repo; instead, a new `xla_native_functions.yaml` file will maintain the list of ops that xla supports, and the codegen will generated the header file. This is more similar to what our other in-tree codegen does. Unfortunately I couldn't keep it byte-for-byte identically because the original order of the signatures was arbitrary, but the file contains the same set of declarations with the same signatures. **Byte-for-byte changes** `aten_xla_type_default.h` contains the headers for functions that fall back to CPU, same as before (identical sans comments and whitespace, and `Stream` -> `at::Stream` [they had a `using c10::Stream` somewhere]). `aten_xla_type_default.cpp` contains a few things: * kernels that fall back to CPU for any non-composite ops that xla doesn't support * code-generated out kernels for a few operators, which just call into the xla-implemented kernel * dispatcher registrations for all relevant kernels (xla kernels, codegen'd out wrappers, codegen'd fallbacks) That file is byte-for-byte identical with a few differences: * namespaces: `c10::Device` ->`at::Device`, and `Stream` -> `at::Stream` * The other diffs are smaller and easy to look over, I pasted them here: P408712288 * We no longer manually unpack `TensorOptions` in the CPU fallbacks, and instead use the faithful C++ API * The codegen'd CPU fallback for `miopen_rnn_backward` had a small bug that this codegen fixes: it wasn't converting all output tensor arguments from CPU back to XLA properly. **Data model changes** * New `ExternalBackendMetadata`, `ExternalBackendFunction`, and `ExternalBackendFunctionsGroup` classes. `ExternalBackendMetadata` represents a line from `xla_native_functions.yaml`, and `ExternalBackendFunction/ExternalBackendFunctionsGroup` are data representations that link the metadata to the corresponding native functions. **NOTE: What is staying/leaving after this PR** The bulk of the changes live in `gen_external_fallbacks.py`. It contains: * declarations/definitions for CPU fallbacks * code-generated `out` wrappers (same as what XLA does) * all dispatcher registrations All of the CPU fallback logic will hopefully be killed in the near future, when I write I replace it with a boxed fallback. I opted to put it in a new file (`gen_external_fallbacks.py`) because (a) most of it will disappear soon, making it easier to disentangle later, and (b) it made merge conflicts (slightly) easier to deal with while working on this PR :) When that logic is killed, I'm planning to move the remaining logic (`codegen'd out wrappers + dispatcher registrations) into `register_dispatch_key.py`, since the logic will eventually look a lot more similar. Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346) [ghstack-poisoned]
bdhirsh
added a commit
to pytorch/pytorch
that referenced
this pull request
Apr 21, 2021
This PR ports the codegen logic from the [XLA repo](https://github.com/pytorch/xla/blob/master/scripts/gen.py) to be in-tree. This particular PR maintains (almost) byte-for-byte compatibility with XLA's codegen, with the differences highlighted below. That means that there's some xla-specific stuff in the codegen, which will be removed in subsequent PR's. **Plan** [] (this PR) merge byte-for-byte version of XLA codegen in-tree [] Get XLA to use the in-tree codegen [] Remove the xla-specific `bridge::` API from the codegen [] Kill the CPU fallback codegen in favor of a boxed kernel [] Remove other xla-specific bits, like the file names (`aten_xla_type.h`) [] update codegen to properly handle generating `inplace`/`out` [] update codegen to handle structured **New codegen workflow** I have a corresponding PR in the `pytorch/xla` repo that uses these changes [here](pytorch/xla#2869). After checking out pytorch/xla from source, you can run `python ../tools/codegen/gen_backend_stubs.py --output_dir="torch_xla/csrc/" --source_yaml="xla_native_functions.yaml"`. It spits 3 files, which are close-to-byte-for-byte-identical to what pytorch/xla currently has: * `torch_xla/csrc/aten_xla_type.h` * `torch_xla/csrc/aten_xla_type_default.h` * `torch_xla/csrc/aten_xla_type_default.cpp` `aten_xla_type.h` is currently maintained manually in the pytorch/xla repo; instead, a new `xla_native_functions.yaml` file will maintain the list of ops that xla supports, and the codegen will generated the header file. This is more similar to what our other in-tree codegen does. Unfortunately I couldn't keep it byte-for-byte identically because the original order of the signatures was arbitrary, but the file contains the same set of declarations with the same signatures. **Byte-for-byte changes** `aten_xla_type_default.h` contains the headers for functions that fall back to CPU, same as before (identical sans comments and whitespace, and `Stream` -> `at::Stream` [they had a `using c10::Stream` somewhere]). `aten_xla_type_default.cpp` contains a few things: * kernels that fall back to CPU for any non-composite ops that xla doesn't support * code-generated out kernels for a few operators, which just call into the xla-implemented kernel * dispatcher registrations for all relevant kernels (xla kernels, codegen'd out wrappers, codegen'd fallbacks) That file is byte-for-byte identical with a few differences: * namespaces: `c10::Device` ->`at::Device`, and `Stream` -> `at::Stream` * The other diffs are smaller and easy to look over, I pasted them here: P408712288 * We no longer manually unpack `TensorOptions` in the CPU fallbacks, and instead use the faithful C++ API * The codegen'd CPU fallback for `miopen_rnn_backward` had a small bug that this codegen fixes: it wasn't converting all output tensor arguments from CPU back to XLA properly. **Data model changes** * New `ExternalBackendMetadata`, `ExternalBackendFunction`, and `ExternalBackendFunctionsGroup` classes. `ExternalBackendMetadata` represents a line from `xla_native_functions.yaml`, and `ExternalBackendFunction/ExternalBackendFunctionsGroup` are data representations that link the metadata to the corresponding native functions. **NOTE: What is staying/leaving after this PR** The bulk of the changes live in `gen_external_fallbacks.py`. It contains: * declarations/definitions for CPU fallbacks * code-generated `out` wrappers (same as what XLA does) * all dispatcher registrations All of the CPU fallback logic will hopefully be killed in the near future, when I write I replace it with a boxed fallback. I opted to put it in a new file (`gen_external_fallbacks.py`) because (a) most of it will disappear soon, making it easier to disentangle later, and (b) it made merge conflicts (slightly) easier to deal with while working on this PR :) When that logic is killed, I'm planning to move the remaining logic (`codegen'd out wrappers + dispatcher registrations) into `register_dispatch_key.py`, since the logic will eventually look a lot more similar. Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346) [ghstack-poisoned]
Collaborator
|
@bdhirsh Do we still need this pr? |
Contributor
Author
|
this is old and can definitely be closed 😛 I had this open when I was experimenting integrating the public codegen API with XLA |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.