generate xla codegen in-tree#55050
Conversation
[ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit ea9fa12 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
| Job | Step | Action |
|---|---|---|
| Run tests | 🔁 rerun |
This comment was automatically generated by Dr. CI (expand for details).
Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group.
not ready for review yet [ghstack-poisoned]
not ready for review yet [ghstack-poisoned]
not ready for review yet [ghstack-poisoned]
not ready for review yet [ghstack-poisoned]
not ready for review yet Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346) [ghstack-poisoned]
not ready for review yet Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346) [ghstack-poisoned]
not ready for review yet Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346) [ghstack-poisoned]
not ready for review yet Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346) [ghstack-poisoned]
not ready for review yet Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346) [ghstack-poisoned]
not ready for review yet Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346) [ghstack-poisoned]
not ready for review yet Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346) [ghstack-poisoned]
not ready for review yet Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346) [ghstack-poisoned]
not ready for review yet Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346) [ghstack-poisoned]
not ready for review yet Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346) [ghstack-poisoned]
not ready for review yet Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346) [ghstack-poisoned]
not ready for review yet Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346) [ghstack-poisoned]
| return [r] | ||
|
|
||
| # TODO: how come ValuesView isn't a Sequence lol | ||
| return list(concatMap(flatten_pre_group, list(pre_grouped_native_functions.values()))) |
There was a problem hiding this comment.
nope, identical. Just moved it into a function so gen_backend_stubs can re-use it. I guess the diff tool had trouble seeing that 😛
| return f"{self.name}" | ||
|
|
||
| @dataclass(frozen=True) | ||
| class ExternalBackendMetadata: |
There was a problem hiding this comment.
An alternative data model, which would require less structs to be defined here, is to index the external backend metadata separately from ExternalBackendFunction, rather than insisting it be directly contained here. Then, anywhere you previously would have taken an ExternalBackendFunction, you instead take a NativeFunction and then look up the ExternalBackendMetadata from the index.
There was a problem hiding this comment.
Yeah, that reduces the amount every data class stuff that goes in here, at the cost of requiring all of the code that acts on ExternalBackendFunctions to do a little more work.
I have a slight preference for the beefed-up data model. The properties I defined on ExternalBackendFunction save a fair amount of redundant work from all of the callers, and will abstract any future logic we add to the metadata; e.g. the fact that not having a metadata -> the backend hasn't implemented that op, or that primary will figure out if the backend favors out vs functional operators. But let me know if you feel strongly one way or the other.
| if kind == SchemaKind.out or kind == SchemaKind.inplace: | ||
| assert self.metadata is None or not self.metadata.structured, \ | ||
| "Found an out/inplace operator marked with the structured keyword." \ | ||
| f" Only functional operators can be marked as structured. operator={str(self.native_function.func.name)}" |
There was a problem hiding this comment.
Isn't there an invariant that native_function has to agree with metadata.operator?
There was a problem hiding this comment.
yep, should have add that before
| import pathlib | ||
| import argparse | ||
| import os | ||
| import yaml |
There was a problem hiding this comment.
For ease of in tree development, we're probably going to want to setup some tests that exercise this code path on simplified YAML files that cover important functionality.
There was a problem hiding this comment.
In particular, unlike in tree native_functions.yaml, we are going to need testing for error cases and making sure we give appropriate errors in those cases. Expect tests can help make this painless.
tools/codegen/gen_backend_stubs.py
Outdated
|
|
||
| backend = yaml_values['backend'] | ||
| supported = yaml_values['supported'] | ||
| supported_autograd = yaml_values['autograd'] |
There was a problem hiding this comment.
As this YAML is user provided it is worth being more paranoid about the inputs provided here. Good input validation here will help save you from weird error messages later. NativeFunction.from_yaml does a pretty good job of validating its arguments and giving useful error information when things have gone wrong, and I recommend modeling the code here after it, if you don't have ideas on your own.
There was a problem hiding this comment.
You're right, thanks. I'm going to make the error'ing a little more robust in this PR. I'll probably add some explicit tests in a followup.
tools/codegen/gen_backend_stubs.py
Outdated
| template_dir = os.path.join(pytorch_root, "aten/src/ATen/templates") | ||
|
|
||
| def make_file_manager(install_dir: str) -> FileManager: | ||
| return FileManager(install_dir=install_dir, template_dir=template_dir, dry_run=False) |
There was a problem hiding this comment.
NB: file manager is pointless unless you let people dry_run it. The abstraction solely exists to let people dry run the codegen script so that they can find out what files it writes out.
There was a problem hiding this comment.
I thought FileManager (in non-dry-run mode) also suppressed output if it was unchanged? (Not that supporting dry_run here is a bad idea)
There was a problem hiding this comment.
ah, got it. It doesn't seem unreasonable to allow an optional dry_run flag to the external codegen, similar to what gen.py has.
And @bhosmer it looks like you're right; if dry_run is true, we never write to the file. If dry_run is false, we only write to it if the contents have changed.
| @method_with_native_function | ||
| def __call__(self, g: Union[ExternalBackendFunctionsGroup, ExternalBackendFunction]) -> List[str]: | ||
|
|
||
| def gen_out_wrapper(g: ExternalBackendFunctionsGroup) -> Optional[str]: |
There was a problem hiding this comment.
Why not make these methods on GenExternalAtenFallback? Would save you a level of indentation, and brings this file more in line with the prevailing style of dest modules.
There was a problem hiding this comment.
One reservation that I had was for generating out wrappers: given an ExternalBackendFunctionsGroup, gen_unstructured_external() can't really operate on each ExternalBackendFunction independently: the codegen is supposed to generate an out wrapper if the out ExternalBackendFunction doesn't have ExternalMetadata, and that logic needs to know the name of the functional op (in order to call it). When the functions are nested as closures like they are now, gen_unstructured_external() has access to the higher-scope ExternalBackendFunctionsGroup.
Locally I ended up just doing what you said, but I had to update the signature to gen_unstructured_external(f: ExternalBackendFunction, g: ExternalBackendFunctionGroup), so it has all of the information it needs. I think it's an overall win for readability, since that level of indentation made things pretty confusing.
There was a problem hiding this comment.
(I'm actually not gonna make this change for now, and just land the PR as is. lmk if you think like the change though)
There was a problem hiding this comment.
Well, what you said here doesn't seem to apply to gen_out_wrapper. And it is totally reasonable to pass it in as an extra argument, that's what we do in other places.
I'm not too allergic to nested closures, but I know @bhosmer has been on my case about them in the past, and explicitly lifting out the closures does make it easier to reason about what information is propagated and what is not.
There was a problem hiding this comment.
Removing the nested closures in this file will also be that much easier after I finish moving the gen_out_wrapper logic out of this file as part of #56835. I can do that in a follow-up.
| dispatcher_sig = DispatcherSignature.from_schema(g.out.native_function.func) | ||
| name = dispatcher_sig.name() | ||
|
|
||
| dispatcher_order_args = list(dispatcher.jit_arguments(g.out.native_function.func)) |
There was a problem hiding this comment.
The necessity to list() here seems like bad design on the part of jit_arguments return type
|
|
||
| def get_device_param(args: List[Argument]) -> str: | ||
| # TODO: the XLA codegen has specific precedence rules when determining which tensor argument | ||
| # to use as the device argument. |
There was a problem hiding this comment.
for the XLA case, do they expect all tensors are on the same device? :)
There was a problem hiding this comment.
Good question, I dunno :) maybe @ailzhang does.
We probably want to keep the logic for choosing which input to get the device from consistent either way though, right?
There was a problem hiding this comment.
It looks like mixed xla/cpu devices are allowed, at least:
>>> import torch_xla.core.xla_model as xm
>>> dev = xm.xla_device()
>>> a = torch.ones(3, device=dev)
>>> b = torch.ones(3)
>>> a + b
tensor([2., 2., 2.], device='xla:0')
>>> a.device
device(type='xla', index=0)
>>> b.device
device(type='cpu')
>>>
| namespace = 'AtenXlaTypeDefault' | ||
| payload = f"static_cast<{dispatcher_sig.decl(func_ptr_cast=True)}>(&{namespace}::{name})" | ||
| return f' m.impl("{f.native_function.func.name}", {payload});\n' | ||
|
|
There was a problem hiding this comment.
assert something about target here?
|
Substantively this PR looks pretty good. What would you like to do? Land as is and start merging fixups, or keep working on this until you're closer? |
I'm hoping to land this PR sooner rather than later, but a bunch of the feedback looks like small things that will be easier to fix directly in the PR. I'm leaning towards making most of the quick fixes first. |
This PR ports the codegen logic from the [XLA repo](https://github.com/pytorch/xla/blob/master/scripts/gen.py) to be in-tree. This particular PR maintains (almost) byte-for-byte compatibility with XLA's codegen, with the differences highlighted below. That means that there's some xla-specific stuff in the codegen, which will be removed in subsequent PR's. **Plan** [] (this PR) merge byte-for-byte version of XLA codegen in-tree [] Get XLA to use the in-tree codegen [] Remove the xla-specific `bridge::` API from the codegen [] Kill the CPU fallback codegen in favor of a boxed kernel [] Remove other xla-specific bits, like the file names (`aten_xla_type.h`) [] update codegen to properly handle generating `inplace`/`out` [] update codegen to handle structured **New codegen workflow** I have a corresponding PR in the `pytorch/xla` repo that uses these changes [here](pytorch/xla#2869). After checking out pytorch/xla from source, you can run `python ../tools/codegen/gen_backend_stubs.py --output_dir="torch_xla/csrc/" --source_yaml="xla_native_functions.yaml"`. It spits 3 files, which are close-to-byte-for-byte-identical to what pytorch/xla currently has: * `torch_xla/csrc/aten_xla_type.h` * `torch_xla/csrc/aten_xla_type_default.h` * `torch_xla/csrc/aten_xla_type_default.cpp` `aten_xla_type.h` is currently maintained manually in the pytorch/xla repo; instead, a new `xla_native_functions.yaml` file will maintain the list of ops that xla supports, and the codegen will generated the header file. This is more similar to what our other in-tree codegen does. Unfortunately I couldn't keep it byte-for-byte identically because the original order of the signatures was arbitrary, but the file contains the same set of declarations with the same signatures. **Byte-for-byte changes** `aten_xla_type_default.h` contains the headers for functions that fall back to CPU, same as before (identical sans comments and whitespace, and `Stream` -> `at::Stream` [they had a `using c10::Stream` somewhere]). `aten_xla_type_default.cpp` contains a few things: * kernels that fall back to CPU for any non-composite ops that xla doesn't support * code-generated out kernels for a few operators, which just call into the xla-implemented kernel * dispatcher registrations for all relevant kernels (xla kernels, codegen'd out wrappers, codegen'd fallbacks) That file is byte-for-byte identical with a few differences: * namespaces: `c10::Device` ->`at::Device`, and `Stream` -> `at::Stream` * The other diffs are smaller and easy to look over, I pasted them here: P408712288 * We no longer manually unpack `TensorOptions` in the CPU fallbacks, and instead use the faithful C++ API * The codegen'd CPU fallback for `miopen_rnn_backward` had a small bug that this codegen fixes: it wasn't converting all output tensor arguments from CPU back to XLA properly. **Data model changes** * New `ExternalBackendMetadata`, `ExternalBackendFunction`, and `ExternalBackendFunctionsGroup` classes. `ExternalBackendMetadata` represents a line from `xla_native_functions.yaml`, and `ExternalBackendFunction/ExternalBackendFunctionsGroup` are data representations that link the metadata to the corresponding native functions. **NOTE: What is staying/leaving after this PR** The bulk of the changes live in `gen_external_fallbacks.py`. It contains: * declarations/definitions for CPU fallbacks * code-generated `out` wrappers (same as what XLA does) * all dispatcher registrations All of the CPU fallback logic will hopefully be killed in the near future, when I write I replace it with a boxed fallback. I opted to put it in a new file (`gen_external_fallbacks.py`) because (a) most of it will disappear soon, making it easier to disentangle later, and (b) it made merge conflicts (slightly) easier to deal with while working on this PR :) When that logic is killed, I'm planning to move the remaining logic (`codegen'd out wrappers + dispatcher registrations) into `register_dispatch_key.py`, since the logic will eventually look a lot more similar. Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346) [ghstack-poisoned]
This PR ports the codegen logic from the [XLA repo](https://github.com/pytorch/xla/blob/master/scripts/gen.py) to be in-tree. This particular PR maintains (almost) byte-for-byte compatibility with XLA's codegen, with the differences highlighted below. That means that there's some xla-specific stuff in the codegen, which will be removed in subsequent PR's. **Plan** [] (this PR) merge byte-for-byte version of XLA codegen in-tree [] Get XLA to use the in-tree codegen [] Remove the xla-specific `bridge::` API from the codegen [] Kill the CPU fallback codegen in favor of a boxed kernel [] Remove other xla-specific bits, like the file names (`aten_xla_type.h`) [] update codegen to properly handle generating `inplace`/`out` [] update codegen to handle structured **New codegen workflow** I have a corresponding PR in the `pytorch/xla` repo that uses these changes [here](pytorch/xla#2869). After checking out pytorch/xla from source, you can run `python ../tools/codegen/gen_backend_stubs.py --output_dir="torch_xla/csrc/" --source_yaml="xla_native_functions.yaml"`. It spits 3 files, which are close-to-byte-for-byte-identical to what pytorch/xla currently has: * `torch_xla/csrc/aten_xla_type.h` * `torch_xla/csrc/aten_xla_type_default.h` * `torch_xla/csrc/aten_xla_type_default.cpp` `aten_xla_type.h` is currently maintained manually in the pytorch/xla repo; instead, a new `xla_native_functions.yaml` file will maintain the list of ops that xla supports, and the codegen will generated the header file. This is more similar to what our other in-tree codegen does. Unfortunately I couldn't keep it byte-for-byte identically because the original order of the signatures was arbitrary, but the file contains the same set of declarations with the same signatures. **Byte-for-byte changes** `aten_xla_type_default.h` contains the headers for functions that fall back to CPU, same as before (identical sans comments and whitespace, and `Stream` -> `at::Stream` [they had a `using c10::Stream` somewhere]). `aten_xla_type_default.cpp` contains a few things: * kernels that fall back to CPU for any non-composite ops that xla doesn't support * code-generated out kernels for a few operators, which just call into the xla-implemented kernel * dispatcher registrations for all relevant kernels (xla kernels, codegen'd out wrappers, codegen'd fallbacks) That file is byte-for-byte identical with a few differences: * namespaces: `c10::Device` ->`at::Device`, and `Stream` -> `at::Stream` * The other diffs are smaller and easy to look over, I pasted them here: P408712288 * We no longer manually unpack `TensorOptions` in the CPU fallbacks, and instead use the faithful C++ API * The codegen'd CPU fallback for `miopen_rnn_backward` had a small bug that this codegen fixes: it wasn't converting all output tensor arguments from CPU back to XLA properly. **Data model changes** * New `ExternalBackendMetadata`, `ExternalBackendFunction`, and `ExternalBackendFunctionsGroup` classes. `ExternalBackendMetadata` represents a line from `xla_native_functions.yaml`, and `ExternalBackendFunction/ExternalBackendFunctionsGroup` are data representations that link the metadata to the corresponding native functions. **NOTE: What is staying/leaving after this PR** The bulk of the changes live in `gen_external_fallbacks.py`. It contains: * declarations/definitions for CPU fallbacks * code-generated `out` wrappers (same as what XLA does) * all dispatcher registrations All of the CPU fallback logic will hopefully be killed in the near future, when I write I replace it with a boxed fallback. I opted to put it in a new file (`gen_external_fallbacks.py`) because (a) most of it will disappear soon, making it easier to disentangle later, and (b) it made merge conflicts (slightly) easier to deal with while working on this PR :) When that logic is killed, I'm planning to move the remaining logic (`codegen'd out wrappers + dispatcher registrations) into `register_dispatch_key.py`, since the logic will eventually look a lot more similar. Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346) [ghstack-poisoned]
|
This pull request has been reverted by 90e532f. |
Summary: Pull Request resolved: #55050 not ready for review yet Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27708346 Pulled By: bdhirsh fbshipit-source-id: 2289edd641f30277d7561cf2d48ec69c6a2137a9
… in-tree"" This is a re-land of #55050, but with a small change (pasted the patch below). I confirmed that RegistrationDeclarations.yaml doesn't change after this PR with the following: ``` diff -Naur build/aten/src/ATen/RegistrationDeclarations.h ../pytorch/build/aten/src/ATen/RegistrationDeclarations.h ``` patch: ``` --- a/tools/codegen/api/types.py +++ b/tools/codegen/api/types.py @@ -41,12 +41,12 @@ tensorListT = BaseCppType('at', 'TensorList') dimnameT = BaseCppType('at', 'Dimname') dimnameListT = BaseCppType('at', 'DimnameList') layoutT = BaseCppType('at', 'Layout') -deviceT = BaseCppType('c10', 'Device') +deviceT = BaseCppType('at', 'Device') scalarT = BaseCppType('at', 'Scalar') memoryFormatT = BaseCppType('at', 'MemoryFormat') qschemeT = BaseCppType('at', 'QScheme') storageT = BaseCppType('at', 'Storage') -streamT = BaseCppType('', 'Stream') +streamT = BaseCppType('at', 'Stream') intArrayRefT = BaseCppType('at', 'IntArrayRef') tensorOptionsT = BaseCppType('at', 'TensorOptions') typeAndSizeT = BaseCppType('torch::autograd::generated', 'TypeAndSize') ``` I must have accidentally added the `c10` when I was debugging, to reduce the number of byte-for-byte changes in `aten_xla_type_default.cpp`. That caused a change to `RegistrationDeclarations.yaml`, which is BC-breaking for any downstream consumers. Updating it to ensure that RegistrationDeclarations.yaml is completely unchanged This reverts commit 90e532f. Differential Revision: [D27915305](https://our.internmc.facebook.com/intern/diff/D27915305) [ghstack-poisoned]
This is a re-land of #55050, but with a small change (pasted the patch below). I confirmed that RegistrationDeclarations.yaml doesn't change after this PR with the following: ``` diff -Naur build/aten/src/ATen/RegistrationDeclarations.h ../pytorch/build/aten/src/ATen/RegistrationDeclarations.h ``` patch: ``` --- a/tools/codegen/api/types.py +++ b/tools/codegen/api/types.py @@ -41,12 +41,12 @@ tensorListT = BaseCppType('at', 'TensorList') dimnameT = BaseCppType('at', 'Dimname') dimnameListT = BaseCppType('at', 'DimnameList') layoutT = BaseCppType('at', 'Layout') -deviceT = BaseCppType('c10', 'Device') +deviceT = BaseCppType('at', 'Device') scalarT = BaseCppType('at', 'Scalar') memoryFormatT = BaseCppType('at', 'MemoryFormat') qschemeT = BaseCppType('at', 'QScheme') storageT = BaseCppType('at', 'Storage') -streamT = BaseCppType('', 'Stream') +streamT = BaseCppType('at', 'Stream') intArrayRefT = BaseCppType('at', 'IntArrayRef') tensorOptionsT = BaseCppType('at', 'TensorOptions') typeAndSizeT = BaseCppType('torch::autograd::generated', 'TypeAndSize') ``` I must have accidentally added the `c10` when I was debugging, to reduce the number of byte-for-byte changes in `aten_xla_type_default.cpp`. That caused a change to `RegistrationDeclarations.yaml`, which is BC-breaking for any downstream consumers. Updating it to ensure that RegistrationDeclarations.yaml is completely unchanged This reverts commit 90e532f. Differential Revision: [D27915305](https://our.internmc.facebook.com/intern/diff/D27915305) [ghstack-poisoned]
… in-tree"" This is a re-land of #55050, but with a small change (pasted the patch below). I confirmed that RegistrationDeclarations.yaml doesn't change after this PR with the following: ``` diff -Naur build/aten/src/ATen/RegistrationDeclarations.h ../pytorch/build/aten/src/ATen/RegistrationDeclarations.h ``` patch: ``` --- a/tools/codegen/api/types.py +++ b/tools/codegen/api/types.py @@ -41,12 +41,12 @@ tensorListT = BaseCppType('at', 'TensorList') dimnameT = BaseCppType('at', 'Dimname') dimnameListT = BaseCppType('at', 'DimnameList') layoutT = BaseCppType('at', 'Layout') -deviceT = BaseCppType('c10', 'Device') +deviceT = BaseCppType('at', 'Device') scalarT = BaseCppType('at', 'Scalar') memoryFormatT = BaseCppType('at', 'MemoryFormat') qschemeT = BaseCppType('at', 'QScheme') storageT = BaseCppType('at', 'Storage') -streamT = BaseCppType('', 'Stream') +streamT = BaseCppType('at', 'Stream') intArrayRefT = BaseCppType('at', 'IntArrayRef') tensorOptionsT = BaseCppType('at', 'TensorOptions') typeAndSizeT = BaseCppType('torch::autograd::generated', 'TypeAndSize') ``` I must have accidentally added the `c10` when I was debugging, to reduce the number of byte-for-byte changes in `aten_xla_type_default.cpp`. That caused a change to `RegistrationDeclarations.yaml`, which is BC-breaking for any downstream consumers. Updating it to ensure that RegistrationDeclarations.yaml is completely unchanged This reverts commit 90e532f. Differential Revision: [D27915305](https://our.internmc.facebook.com/intern/diff/D27915305) [ghstack-poisoned]
This is a re-land of #55050, but with a small change (pasted the patch below). I confirmed that RegistrationDeclarations.yaml doesn't change after this PR with the following: ``` diff -Naur build/aten/src/ATen/RegistrationDeclarations.h ../pytorch/build/aten/src/ATen/RegistrationDeclarations.h ``` patch: ``` --- a/tools/codegen/api/types.py +++ b/tools/codegen/api/types.py @@ -41,12 +41,12 @@ tensorListT = BaseCppType('at', 'TensorList') dimnameT = BaseCppType('at', 'Dimname') dimnameListT = BaseCppType('at', 'DimnameList') layoutT = BaseCppType('at', 'Layout') -deviceT = BaseCppType('c10', 'Device') +deviceT = BaseCppType('at', 'Device') scalarT = BaseCppType('at', 'Scalar') memoryFormatT = BaseCppType('at', 'MemoryFormat') qschemeT = BaseCppType('at', 'QScheme') storageT = BaseCppType('at', 'Storage') -streamT = BaseCppType('', 'Stream') +streamT = BaseCppType('at', 'Stream') intArrayRefT = BaseCppType('at', 'IntArrayRef') tensorOptionsT = BaseCppType('at', 'TensorOptions') typeAndSizeT = BaseCppType('torch::autograd::generated', 'TypeAndSize') ``` I must have accidentally added the `c10` when I was debugging, to reduce the number of byte-for-byte changes in `aten_xla_type_default.cpp`. That caused a change to `RegistrationDeclarations.yaml`, which is BC-breaking for any downstream consumers. Updating it to ensure that RegistrationDeclarations.yaml is completely unchanged This reverts commit 90e532f. Differential Revision: [D27915305](https://our.internmc.facebook.com/intern/diff/D27915305) [ghstack-poisoned]
ghstack-source-id: 16f89ce Pull Request resolved: pytorch/pytorch#55050
Summary: Pull Request resolved: pytorch#55050 not ready for review yet Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27708346 Pulled By: bdhirsh fbshipit-source-id: 2289edd641f30277d7561cf2d48ec69c6a2137a9
This PR ports the codegen logic from the XLA repo to be in-tree.
This particular PR maintains (almost) byte-for-byte compatibility with XLA's codegen, with the differences highlighted below. That means that there's some xla-specific stuff in the codegen, which will be removed in subsequent PR's.
Plan
[] (this PR) merge byte-for-byte version of XLA codegen in-tree
[] Get XLA to use the in-tree codegen
[] Remove the xla-specific
bridge::API from the codegen[] Kill the CPU fallback codegen in favor of a boxed kernel
[] Remove other xla-specific bits, like the file names (
aten_xla_type.h)[] update codegen to properly handle generating
inplace/out[] update codegen to handle structured
New codegen workflow
I have a corresponding PR in the
pytorch/xlarepo that uses these changes here.After checking out pytorch/xla from source, you can run
python ../tools/codegen/gen_backend_stubs.py --output_dir="torch_xla/csrc/" --source_yaml="xla_native_functions.yaml". It spits 3 files, which are close-to-byte-for-byte-identical to what pytorch/xla currently has:torch_xla/csrc/aten_xla_type.htorch_xla/csrc/aten_xla_type_default.htorch_xla/csrc/aten_xla_type_default.cppaten_xla_type.his currently maintained manually in the pytorch/xla repo; instead, a newxla_native_functions.yamlfile will maintain the list of ops that xla supports, and the codegen will generated the header file. This is more similar to what our other in-tree codegen does. Unfortunately I couldn't keep it byte-for-byte identically because the original order of the signatures was arbitrary, but the file contains the same set of declarations with the same signatures.Byte-for-byte changes
aten_xla_type_default.hcontains the headers for functions that fall back to CPU, same as before (identical sans comments and whitespace, andStream->at::Stream[they had ausing c10::Streamsomewhere]).aten_xla_type_default.cppcontains a few things:That file is byte-for-byte identical with a few differences:
c10::Device->at::Device, andStream->at::StreamTensorOptionsin the CPU fallbacks, and instead use the faithful C++ APImiopen_rnn_backwardhad a small bug that this codegen fixes: it wasn't converting all output tensor arguments from CPU back to XLA properly.Data model changes
ExternalBackendMetadata,ExternalBackendFunction, andExternalBackendFunctionsGroupclasses.ExternalBackendMetadatarepresents a line fromxla_native_functions.yaml, andExternalBackendFunction/ExternalBackendFunctionsGroupare data representations that link the metadata to the corresponding native functions.NOTE: What is staying/leaving after this PR
The bulk of the changes live in
gen_external_fallbacks.py. It contains:outwrappers (same as what XLA does)All of the CPU fallback logic will hopefully be killed in the near future, when I write I replace it with a boxed fallback. I opted to put it in a new file (
gen_external_fallbacks.py) because (a) most of it will disappear soon, making it easier to disentangle later, and (b) it made merge conflicts (slightly) easier to deal with while working on this PR :)When that logic is killed, I'm planning to move the remaining logic (
codegen'd out wrappers + dispatcher registrations) intoregister_dispatch_key.py`, since the logic will eventually look a lot more similar.Stack from ghstack:
Differential Revision: D27708346