Skip to content

generate xla codegen in-tree#55050

Closed
bdhirsh wants to merge 20 commits intogh/bdhirsh/96/basefrom
gh/bdhirsh/96/head
Closed

generate xla codegen in-tree#55050
bdhirsh wants to merge 20 commits intogh/bdhirsh/96/basefrom
gh/bdhirsh/96/head

Conversation

@bdhirsh
Copy link
Copy Markdown
Collaborator

@bdhirsh bdhirsh commented Mar 31, 2021

This PR ports the codegen logic from the XLA repo to be in-tree.

This particular PR maintains (almost) byte-for-byte compatibility with XLA's codegen, with the differences highlighted below. That means that there's some xla-specific stuff in the codegen, which will be removed in subsequent PR's.

Plan
[] (this PR) merge byte-for-byte version of XLA codegen in-tree
[] Get XLA to use the in-tree codegen
[] Remove the xla-specific bridge:: API from the codegen
[] Kill the CPU fallback codegen in favor of a boxed kernel
[] Remove other xla-specific bits, like the file names (aten_xla_type.h)
[] update codegen to properly handle generating inplace/out
[] update codegen to handle structured

New codegen workflow
I have a corresponding PR in the pytorch/xla repo that uses these changes here.
After checking out pytorch/xla from source, you can run python ../tools/codegen/gen_backend_stubs.py --output_dir="torch_xla/csrc/" --source_yaml="xla_native_functions.yaml". It spits 3 files, which are close-to-byte-for-byte-identical to what pytorch/xla currently has:

  • torch_xla/csrc/aten_xla_type.h
  • torch_xla/csrc/aten_xla_type_default.h
  • torch_xla/csrc/aten_xla_type_default.cpp

aten_xla_type.h is currently maintained manually in the pytorch/xla repo; instead, a new xla_native_functions.yaml file will maintain the list of ops that xla supports, and the codegen will generated the header file. This is more similar to what our other in-tree codegen does. Unfortunately I couldn't keep it byte-for-byte identically because the original order of the signatures was arbitrary, but the file contains the same set of declarations with the same signatures.

Byte-for-byte changes

aten_xla_type_default.h contains the headers for functions that fall back to CPU, same as before (identical sans comments and whitespace, and Stream -> at::Stream [they had a using c10::Stream somewhere]).

aten_xla_type_default.cpp contains a few things:

  • kernels that fall back to CPU for any non-composite ops that xla doesn't support
  • code-generated out kernels for a few operators, which just call into the xla-implemented kernel
  • dispatcher registrations for all relevant kernels (xla kernels, codegen'd out wrappers, codegen'd fallbacks)

That file is byte-for-byte identical with a few differences:

  • namespaces: c10::Device ->at::Device, and Stream -> at::Stream
  • The other diffs are smaller and easy to look over, I pasted them here: P408712288
    • We no longer manually unpack TensorOptions in the CPU fallbacks, and instead use the faithful C++ API
    • The codegen'd CPU fallback for miopen_rnn_backward had a small bug that this codegen fixes: it wasn't converting all output tensor arguments from CPU back to XLA properly.

Data model changes

  • New ExternalBackendMetadata, ExternalBackendFunction, and ExternalBackendFunctionsGroup classes. ExternalBackendMetadata represents a line from xla_native_functions.yaml, and ExternalBackendFunction/ExternalBackendFunctionsGroup are data representations that link the metadata to the corresponding native functions.

NOTE: What is staying/leaving after this PR
The bulk of the changes live in gen_external_fallbacks.py. It contains:

  • declarations/definitions for CPU fallbacks
  • code-generated out wrappers (same as what XLA does)
  • all dispatcher registrations

All of the CPU fallback logic will hopefully be killed in the near future, when I write I replace it with a boxed fallback. I opted to put it in a new file (gen_external_fallbacks.py) because (a) most of it will disappear soon, making it easier to disentangle later, and (b) it made merge conflicts (slightly) easier to deal with while working on this PR :)

When that logic is killed, I'm planning to move the remaining logic (codegen'd out wrappers + dispatcher registrations) into register_dispatch_key.py`, since the logic will eventually look a lot more similar.

Stack from ghstack:

Differential Revision: D27708346

@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Mar 31, 2021

💊 CI failures summary and remediations

As of commit ea9fa12 (more details on the Dr. CI page):


  • 3/3 failures possibly* introduced in this PR
    • 1/3 non-scanned failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Apr 20 22:28:23 Failed to generate ATEN binding...b/jenkins/workspace/xla/scripts/generate_code.sh']
Apr 20 22:28:22 Failed to generate wrapper for FuncDef(cpp_sig='Tensor fft_rfftfreq(int64_t n, double d, c10::optional<ScalarType> dtype, c10::optional<Layout> layout, c10::optional<c10::Device> device, c10::optional<bool> pin_memory)', aten_sig='aten::fft_rfftfreq(int n, float d=1.0, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor', dispatch=False, default=True): 
Apr 20 22:28:22 Generated 1908 wrappers for /var/lib/jenkins/workspace/xla/scripts/../../build/aten/src/ATen/RegistrationDeclarations.h
Apr 20 22:28:22 AtenXlaType function missed override: Tensor empty(IntArrayRef size, c10::optional<ScalarType> dtype, c10::optional<Layout> layout, c10::optional<Device> device, c10::optional<bool> pin_memory, c10::optional<MemoryFormat> memory_format); // empty(IntArrayRef,c10::optional<ScalarType>,c10::optional<Layout>,c10::optional<Device>,c10::optional<bool>,c10::optional<MemoryFormat>)->Tensor
Apr 20 22:28:22 AtenXlaType function missed override: Tensor empty_strided(IntArrayRef size, IntArrayRef stride, c10::optional<ScalarType> dtype, c10::optional<Layout> layout, c10::optional<Device> device, c10::optional<bool> pin_memory); // empty_strided(IntArrayRef,IntArrayRef,c10::optional<ScalarType>,c10::optional<Layout>,c10::optional<Device>,c10::optional<bool>)->Tensor
Apr 20 22:28:22 Traceback (most recent call last):
Apr 20 22:28:22   File "/var/lib/jenkins/workspace/xla/scripts/gen.py", line 1195, in <module>
Apr 20 22:28:22     generate(args)
Apr 20 22:28:22   File "/var/lib/jenkins/workspace/xla/scripts/gen.py", line 1165, in generate
Apr 20 22:28:22     assert check_overrides(overrides, overridden)
Apr 20 22:28:22 AssertionError
Apr 20 22:28:23 Failed to generate ATEN bindings: ['/var/lib/jenkins/workspace/xla/scripts/generate_code.sh']
Apr 20 22:28:23 Building torch_xla version: 1.9
Apr 20 22:28:23 XLA Commit ID: f41a53458d12b19027f02e5ec34f51d187a63368
Apr 20 22:28:23 PyTorch Commit ID: 0ec17b5443644af6742266556bb6d04206a3e8c0
Apr 20 22:28:23 + cleanup
Apr 20 22:28:23 + retcode=1
Apr 20 22:28:23 + set +x
Apr 20 22:28:23 =================== sccache compilation log ===================
Apr 20 22:28:23 =========== If your build fails, please take a look at the log above for possible reasons ===========
Apr 20 22:28:23 Compile requests                    4934
Apr 20 22:28:23 Compile requests executed           4611

1 failure not recognized by patterns:

Job Step Action
CircleCI pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test1 Run tests 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

not ready for review yet




[ghstack-poisoned]
not ready for review yet




[ghstack-poisoned]
bdhirsh added a commit that referenced this pull request Mar 31, 2021
ghstack-source-id: 2b11df4
Pull Request resolved: #55050
not ready for review yet




[ghstack-poisoned]
not ready for review yet




[ghstack-poisoned]
bdhirsh added a commit that referenced this pull request Apr 9, 2021
ghstack-source-id: 4d2a668
Pull Request resolved: #55050
not ready for review yet


Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346)

[ghstack-poisoned]
bdhirsh added a commit that referenced this pull request Apr 12, 2021
ghstack-source-id: 8a780d1
Pull Request resolved: #55050
bdhirsh added 11 commits April 12, 2021 15:04
not ready for review yet


Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346)

[ghstack-poisoned]
not ready for review yet


Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346)

[ghstack-poisoned]
not ready for review yet


Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346)

[ghstack-poisoned]
not ready for review yet


Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346)

[ghstack-poisoned]
not ready for review yet


Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346)

[ghstack-poisoned]
not ready for review yet


Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346)

[ghstack-poisoned]
not ready for review yet


Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346)

[ghstack-poisoned]
not ready for review yet


Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346)

[ghstack-poisoned]
not ready for review yet


Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346)

[ghstack-poisoned]
not ready for review yet


Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346)

[ghstack-poisoned]
not ready for review yet


Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346)

[ghstack-poisoned]
return [r]

# TODO: how come ValuesView isn't a Sequence lol
return list(concatMap(flatten_pre_group, list(pre_grouped_native_functions.values())))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any semantic change here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, identical. Just moved it into a function so gen_backend_stubs can re-use it. I guess the diff tool had trouble seeing that 😛

return f"{self.name}"

@dataclass(frozen=True)
class ExternalBackendMetadata:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative data model, which would require less structs to be defined here, is to index the external backend metadata separately from ExternalBackendFunction, rather than insisting it be directly contained here. Then, anywhere you previously would have taken an ExternalBackendFunction, you instead take a NativeFunction and then look up the ExternalBackendMetadata from the index.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that reduces the amount every data class stuff that goes in here, at the cost of requiring all of the code that acts on ExternalBackendFunctions to do a little more work.

I have a slight preference for the beefed-up data model. The properties I defined on ExternalBackendFunction save a fair amount of redundant work from all of the callers, and will abstract any future logic we add to the metadata; e.g. the fact that not having a metadata -> the backend hasn't implemented that op, or that primary will figure out if the backend favors out vs functional operators. But let me know if you feel strongly one way or the other.

if kind == SchemaKind.out or kind == SchemaKind.inplace:
assert self.metadata is None or not self.metadata.structured, \
"Found an out/inplace operator marked with the structured keyword." \
f" Only functional operators can be marked as structured. operator={str(self.native_function.func.name)}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there an invariant that native_function has to agree with metadata.operator?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, should have add that before

import pathlib
import argparse
import os
import yaml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ease of in tree development, we're probably going to want to setup some tests that exercise this code path on simplified YAML files that cover important functionality.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, unlike in tree native_functions.yaml, we are going to need testing for error cases and making sure we give appropriate errors in those cases. Expect tests can help make this painless.


backend = yaml_values['backend']
supported = yaml_values['supported']
supported_autograd = yaml_values['autograd']
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this YAML is user provided it is worth being more paranoid about the inputs provided here. Good input validation here will help save you from weird error messages later. NativeFunction.from_yaml does a pretty good job of validating its arguments and giving useful error information when things have gone wrong, and I recommend modeling the code here after it, if you don't have ideas on your own.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, thanks. I'm going to make the error'ing a little more robust in this PR. I'll probably add some explicit tests in a followup.

template_dir = os.path.join(pytorch_root, "aten/src/ATen/templates")

def make_file_manager(install_dir: str) -> FileManager:
return FileManager(install_dir=install_dir, template_dir=template_dir, dry_run=False)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: file manager is pointless unless you let people dry_run it. The abstraction solely exists to let people dry run the codegen script so that they can find out what files it writes out.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought FileManager (in non-dry-run mode) also suppressed output if it was unchanged? (Not that supporting dry_run here is a bad idea)

Copy link
Copy Markdown
Collaborator Author

@bdhirsh bdhirsh Apr 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, got it. It doesn't seem unreasonable to allow an optional dry_run flag to the external codegen, similar to what gen.py has.

And @bhosmer it looks like you're right; if dry_run is true, we never write to the file. If dry_run is false, we only write to it if the contents have changed.

@method_with_native_function
def __call__(self, g: Union[ExternalBackendFunctionsGroup, ExternalBackendFunction]) -> List[str]:

def gen_out_wrapper(g: ExternalBackendFunctionsGroup) -> Optional[str]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make these methods on GenExternalAtenFallback? Would save you a level of indentation, and brings this file more in line with the prevailing style of dest modules.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One reservation that I had was for generating out wrappers: given an ExternalBackendFunctionsGroup, gen_unstructured_external() can't really operate on each ExternalBackendFunction independently: the codegen is supposed to generate an out wrapper if the out ExternalBackendFunction doesn't have ExternalMetadata, and that logic needs to know the name of the functional op (in order to call it). When the functions are nested as closures like they are now, gen_unstructured_external() has access to the higher-scope ExternalBackendFunctionsGroup.

Locally I ended up just doing what you said, but I had to update the signature to gen_unstructured_external(f: ExternalBackendFunction, g: ExternalBackendFunctionGroup), so it has all of the information it needs. I think it's an overall win for readability, since that level of indentation made things pretty confusing.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm actually not gonna make this change for now, and just land the PR as is. lmk if you think like the change though)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, what you said here doesn't seem to apply to gen_out_wrapper. And it is totally reasonable to pass it in as an extra argument, that's what we do in other places.

I'm not too allergic to nested closures, but I know @bhosmer has been on my case about them in the past, and explicitly lifting out the closures does make it easier to reason about what information is propagated and what is not.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the nested closures in this file will also be that much easier after I finish moving the gen_out_wrapper logic out of this file as part of #56835. I can do that in a follow-up.

dispatcher_sig = DispatcherSignature.from_schema(g.out.native_function.func)
name = dispatcher_sig.name()

dispatcher_order_args = list(dispatcher.jit_arguments(g.out.native_function.func))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The necessity to list() here seems like bad design on the part of jit_arguments return type


def get_device_param(args: List[Argument]) -> str:
# TODO: the XLA codegen has specific precedence rules when determining which tensor argument
# to use as the device argument.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the XLA case, do they expect all tensors are on the same device? :)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I dunno :) maybe @ailzhang does.

We probably want to keep the logic for choosing which input to get the device from consistent either way though, right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like mixed xla/cpu devices are allowed, at least:

>>> import torch_xla.core.xla_model as xm
>>> dev = xm.xla_device()
>>> a = torch.ones(3, device=dev)
>>> b = torch.ones(3)
>>> a + b
tensor([2., 2., 2.], device='xla:0')
>>> a.device
device(type='xla', index=0)
>>> b.device
device(type='cpu')
>>>

namespace = 'AtenXlaTypeDefault'
payload = f"static_cast<{dispatcher_sig.decl(func_ptr_cast=True)}>(&{namespace}::{name})"
return f' m.impl("{f.native_function.func.name}", {payload});\n'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert something about target here?

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Apr 20, 2021

Substantively this PR looks pretty good. What would you like to do? Land as is and start merging fixups, or keep working on this until you're closer?

@bdhirsh
Copy link
Copy Markdown
Collaborator Author

bdhirsh commented Apr 20, 2021

Substantively this PR looks pretty good. What would you like to do? Land as is and start merging fixups, or keep working on this until you're closer?

I'm hoping to land this PR sooner rather than later, but a bunch of the feedback looks like small things that will be easier to fix directly in the PR. I'm leaning towards making most of the quick fixes first.

This PR ports the codegen logic from the [XLA repo](https://github.com/pytorch/xla/blob/master/scripts/gen.py) to be in-tree.

This particular PR maintains (almost) byte-for-byte compatibility with XLA's codegen, with the differences highlighted below. That means that there's some xla-specific stuff in the codegen, which will be removed in subsequent PR's.

**Plan**
[] (this PR) merge byte-for-byte version of XLA codegen in-tree
[] Get XLA to use the in-tree codegen
[] Remove the xla-specific `bridge::` API from the codegen
[] Kill the CPU fallback codegen in favor of a boxed kernel
[] Remove other xla-specific bits, like the file names (`aten_xla_type.h`)
[] update codegen to properly handle generating `inplace`/`out`
[] update codegen to handle structured

**New codegen workflow**
I have a corresponding PR in the `pytorch/xla` repo that uses these changes [here](pytorch/xla#2869).
After checking out pytorch/xla from source, you can run `python ../tools/codegen/gen_backend_stubs.py  --output_dir="torch_xla/csrc/" --source_yaml="xla_native_functions.yaml"`. It spits 3 files, which are close-to-byte-for-byte-identical to what pytorch/xla currently has:
* `torch_xla/csrc/aten_xla_type.h`
* `torch_xla/csrc/aten_xla_type_default.h`
* `torch_xla/csrc/aten_xla_type_default.cpp`

`aten_xla_type.h` is currently maintained manually in the pytorch/xla repo; instead, a new `xla_native_functions.yaml` file will maintain the list of ops that xla supports, and the codegen will generated the header file. This is more similar to what our other in-tree codegen does. Unfortunately I couldn't keep it byte-for-byte identically because the original order of the signatures was arbitrary, but the file contains the same set of declarations with the same signatures.


**Byte-for-byte changes**

`aten_xla_type_default.h` contains the headers for functions that fall back to CPU, same as before (identical sans comments and whitespace, and `Stream` -> `at::Stream` [they had a `using c10::Stream` somewhere]).

`aten_xla_type_default.cpp` contains a few things:
* kernels that fall back to CPU for any non-composite ops that xla doesn't support
* code-generated out kernels for a few operators, which just call into the xla-implemented kernel
* dispatcher registrations for all relevant kernels (xla kernels, codegen'd out wrappers, codegen'd fallbacks)

That file is byte-for-byte identical with a few differences:
* namespaces: `c10::Device` ->`at::Device`, and `Stream` -> `at::Stream`
* The other diffs are smaller and easy to look over, I pasted them here: P408712288
  * We no longer manually unpack `TensorOptions` in the CPU fallbacks, and instead use the faithful C++ API
  * The codegen'd CPU fallback for `miopen_rnn_backward` had a small bug that this codegen fixes: it wasn't converting all output tensor arguments from CPU back to XLA properly.


**Data model changes**
* New `ExternalBackendMetadata`, `ExternalBackendFunction`, and `ExternalBackendFunctionsGroup` classes. `ExternalBackendMetadata` represents a line from `xla_native_functions.yaml`, and `ExternalBackendFunction/ExternalBackendFunctionsGroup` are data representations that link the metadata to the corresponding native functions.


**NOTE: What is staying/leaving after this PR**
The bulk of the changes live in `gen_external_fallbacks.py`. It contains:
* declarations/definitions for CPU fallbacks
* code-generated `out` wrappers (same as what XLA does)
* all dispatcher registrations

All of the CPU fallback logic will hopefully be killed in the near future, when I write I replace it with a boxed fallback. I opted to put it in a new file (`gen_external_fallbacks.py`) because (a) most of it will disappear soon, making it easier to disentangle later, and (b) it made merge conflicts (slightly) easier to deal with while working on this PR :)

When that logic is killed, I'm planning to move the remaining logic (`codegen'd out wrappers + dispatcher registrations) into `register_dispatch_key.py`, since the logic will eventually look a lot more similar.




Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346)

[ghstack-poisoned]
This PR ports the codegen logic from the [XLA repo](https://github.com/pytorch/xla/blob/master/scripts/gen.py) to be in-tree.

This particular PR maintains (almost) byte-for-byte compatibility with XLA's codegen, with the differences highlighted below. That means that there's some xla-specific stuff in the codegen, which will be removed in subsequent PR's.

**Plan**
[] (this PR) merge byte-for-byte version of XLA codegen in-tree
[] Get XLA to use the in-tree codegen
[] Remove the xla-specific `bridge::` API from the codegen
[] Kill the CPU fallback codegen in favor of a boxed kernel
[] Remove other xla-specific bits, like the file names (`aten_xla_type.h`)
[] update codegen to properly handle generating `inplace`/`out`
[] update codegen to handle structured

**New codegen workflow**
I have a corresponding PR in the `pytorch/xla` repo that uses these changes [here](pytorch/xla#2869).
After checking out pytorch/xla from source, you can run `python ../tools/codegen/gen_backend_stubs.py  --output_dir="torch_xla/csrc/" --source_yaml="xla_native_functions.yaml"`. It spits 3 files, which are close-to-byte-for-byte-identical to what pytorch/xla currently has:
* `torch_xla/csrc/aten_xla_type.h`
* `torch_xla/csrc/aten_xla_type_default.h`
* `torch_xla/csrc/aten_xla_type_default.cpp`

`aten_xla_type.h` is currently maintained manually in the pytorch/xla repo; instead, a new `xla_native_functions.yaml` file will maintain the list of ops that xla supports, and the codegen will generated the header file. This is more similar to what our other in-tree codegen does. Unfortunately I couldn't keep it byte-for-byte identically because the original order of the signatures was arbitrary, but the file contains the same set of declarations with the same signatures.


**Byte-for-byte changes**

`aten_xla_type_default.h` contains the headers for functions that fall back to CPU, same as before (identical sans comments and whitespace, and `Stream` -> `at::Stream` [they had a `using c10::Stream` somewhere]).

`aten_xla_type_default.cpp` contains a few things:
* kernels that fall back to CPU for any non-composite ops that xla doesn't support
* code-generated out kernels for a few operators, which just call into the xla-implemented kernel
* dispatcher registrations for all relevant kernels (xla kernels, codegen'd out wrappers, codegen'd fallbacks)

That file is byte-for-byte identical with a few differences:
* namespaces: `c10::Device` ->`at::Device`, and `Stream` -> `at::Stream`
* The other diffs are smaller and easy to look over, I pasted them here: P408712288
  * We no longer manually unpack `TensorOptions` in the CPU fallbacks, and instead use the faithful C++ API
  * The codegen'd CPU fallback for `miopen_rnn_backward` had a small bug that this codegen fixes: it wasn't converting all output tensor arguments from CPU back to XLA properly.


**Data model changes**
* New `ExternalBackendMetadata`, `ExternalBackendFunction`, and `ExternalBackendFunctionsGroup` classes. `ExternalBackendMetadata` represents a line from `xla_native_functions.yaml`, and `ExternalBackendFunction/ExternalBackendFunctionsGroup` are data representations that link the metadata to the corresponding native functions.


**NOTE: What is staying/leaving after this PR**
The bulk of the changes live in `gen_external_fallbacks.py`. It contains:
* declarations/definitions for CPU fallbacks
* code-generated `out` wrappers (same as what XLA does)
* all dispatcher registrations

All of the CPU fallback logic will hopefully be killed in the near future, when I write I replace it with a boxed fallback. I opted to put it in a new file (`gen_external_fallbacks.py`) because (a) most of it will disappear soon, making it easier to disentangle later, and (b) it made merge conflicts (slightly) easier to deal with while working on this PR :)

When that logic is killed, I'm planning to move the remaining logic (`codegen'd out wrappers + dispatcher registrations) into `register_dispatch_key.py`, since the logic will eventually look a lot more similar.




Differential Revision: [D27708346](https://our.internmc.facebook.com/intern/diff/D27708346)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Copy Markdown
Contributor

@bdhirsh merged this pull request in 51d0212.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request has been reverted by 90e532f.

heitorschueroff pushed a commit that referenced this pull request Apr 21, 2021
Summary:
Pull Request resolved: #55050

not ready for review yet

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D27708346

Pulled By: bdhirsh

fbshipit-source-id: 2289edd641f30277d7561cf2d48ec69c6a2137a9
bdhirsh added a commit that referenced this pull request Apr 21, 2021
… in-tree""

This is a re-land of #55050, but with a small change (pasted the patch below).

I confirmed that RegistrationDeclarations.yaml doesn't change after this PR with the following:
```
diff -Naur build/aten/src/ATen/RegistrationDeclarations.h ../pytorch/build/aten/src/ATen/RegistrationDeclarations.h
```

patch:
```
--- a/tools/codegen/api/types.py
+++ b/tools/codegen/api/types.py
@@ -41,12 +41,12 @@ tensorListT = BaseCppType('at', 'TensorList')
 dimnameT = BaseCppType('at', 'Dimname')
 dimnameListT = BaseCppType('at', 'DimnameList')
 layoutT = BaseCppType('at', 'Layout')
-deviceT = BaseCppType('c10', 'Device')
+deviceT = BaseCppType('at', 'Device')
 scalarT = BaseCppType('at', 'Scalar')
 memoryFormatT = BaseCppType('at', 'MemoryFormat')
 qschemeT = BaseCppType('at', 'QScheme')
 storageT = BaseCppType('at', 'Storage')
-streamT = BaseCppType('', 'Stream')
+streamT = BaseCppType('at', 'Stream')
 intArrayRefT = BaseCppType('at', 'IntArrayRef')
 tensorOptionsT = BaseCppType('at', 'TensorOptions')
 typeAndSizeT = BaseCppType('torch::autograd::generated', 'TypeAndSize')
```

I must have accidentally added the `c10` when I was debugging, to reduce the number of byte-for-byte changes in `aten_xla_type_default.cpp`. That caused a change to `RegistrationDeclarations.yaml`, which is BC-breaking for any downstream consumers.



Updating it to ensure that RegistrationDeclarations.yaml is completely
unchanged

This reverts commit 90e532f.

Differential Revision: [D27915305](https://our.internmc.facebook.com/intern/diff/D27915305)

[ghstack-poisoned]
bdhirsh added a commit that referenced this pull request Apr 21, 2021
This is a re-land of #55050, but with a small change (pasted the patch below).

I confirmed that RegistrationDeclarations.yaml doesn't change after this PR with the following:
```
diff -Naur build/aten/src/ATen/RegistrationDeclarations.h ../pytorch/build/aten/src/ATen/RegistrationDeclarations.h
```

patch:
```
--- a/tools/codegen/api/types.py
+++ b/tools/codegen/api/types.py
@@ -41,12 +41,12 @@ tensorListT = BaseCppType('at', 'TensorList')
 dimnameT = BaseCppType('at', 'Dimname')
 dimnameListT = BaseCppType('at', 'DimnameList')
 layoutT = BaseCppType('at', 'Layout')
-deviceT = BaseCppType('c10', 'Device')
+deviceT = BaseCppType('at', 'Device')
 scalarT = BaseCppType('at', 'Scalar')
 memoryFormatT = BaseCppType('at', 'MemoryFormat')
 qschemeT = BaseCppType('at', 'QScheme')
 storageT = BaseCppType('at', 'Storage')
-streamT = BaseCppType('', 'Stream')
+streamT = BaseCppType('at', 'Stream')
 intArrayRefT = BaseCppType('at', 'IntArrayRef')
 tensorOptionsT = BaseCppType('at', 'TensorOptions')
 typeAndSizeT = BaseCppType('torch::autograd::generated', 'TypeAndSize')
```

I must have accidentally added the `c10` when I was debugging, to reduce the number of byte-for-byte changes in `aten_xla_type_default.cpp`. That caused a change to `RegistrationDeclarations.yaml`, which is BC-breaking for any downstream consumers.



Updating it to ensure that RegistrationDeclarations.yaml is completely
unchanged

This reverts commit 90e532f.

Differential Revision: [D27915305](https://our.internmc.facebook.com/intern/diff/D27915305)

[ghstack-poisoned]
bdhirsh added a commit that referenced this pull request Apr 21, 2021
… in-tree""

This is a re-land of #55050, but with a small change (pasted the patch below).

I confirmed that RegistrationDeclarations.yaml doesn't change after this PR with the following:
```
diff -Naur build/aten/src/ATen/RegistrationDeclarations.h ../pytorch/build/aten/src/ATen/RegistrationDeclarations.h
```

patch:
```
--- a/tools/codegen/api/types.py
+++ b/tools/codegen/api/types.py
@@ -41,12 +41,12 @@ tensorListT = BaseCppType('at', 'TensorList')
 dimnameT = BaseCppType('at', 'Dimname')
 dimnameListT = BaseCppType('at', 'DimnameList')
 layoutT = BaseCppType('at', 'Layout')
-deviceT = BaseCppType('c10', 'Device')
+deviceT = BaseCppType('at', 'Device')
 scalarT = BaseCppType('at', 'Scalar')
 memoryFormatT = BaseCppType('at', 'MemoryFormat')
 qschemeT = BaseCppType('at', 'QScheme')
 storageT = BaseCppType('at', 'Storage')
-streamT = BaseCppType('', 'Stream')
+streamT = BaseCppType('at', 'Stream')
 intArrayRefT = BaseCppType('at', 'IntArrayRef')
 tensorOptionsT = BaseCppType('at', 'TensorOptions')
 typeAndSizeT = BaseCppType('torch::autograd::generated', 'TypeAndSize')
```

I must have accidentally added the `c10` when I was debugging, to reduce the number of byte-for-byte changes in `aten_xla_type_default.cpp`. That caused a change to `RegistrationDeclarations.yaml`, which is BC-breaking for any downstream consumers.



Updating it to ensure that RegistrationDeclarations.yaml is completely
unchanged

This reverts commit 90e532f.

Differential Revision: [D27915305](https://our.internmc.facebook.com/intern/diff/D27915305)

[ghstack-poisoned]
bdhirsh added a commit that referenced this pull request Apr 21, 2021
This is a re-land of #55050, but with a small change (pasted the patch below).

I confirmed that RegistrationDeclarations.yaml doesn't change after this PR with the following:
```
diff -Naur build/aten/src/ATen/RegistrationDeclarations.h ../pytorch/build/aten/src/ATen/RegistrationDeclarations.h
```

patch:
```
--- a/tools/codegen/api/types.py
+++ b/tools/codegen/api/types.py
@@ -41,12 +41,12 @@ tensorListT = BaseCppType('at', 'TensorList')
 dimnameT = BaseCppType('at', 'Dimname')
 dimnameListT = BaseCppType('at', 'DimnameList')
 layoutT = BaseCppType('at', 'Layout')
-deviceT = BaseCppType('c10', 'Device')
+deviceT = BaseCppType('at', 'Device')
 scalarT = BaseCppType('at', 'Scalar')
 memoryFormatT = BaseCppType('at', 'MemoryFormat')
 qschemeT = BaseCppType('at', 'QScheme')
 storageT = BaseCppType('at', 'Storage')
-streamT = BaseCppType('', 'Stream')
+streamT = BaseCppType('at', 'Stream')
 intArrayRefT = BaseCppType('at', 'IntArrayRef')
 tensorOptionsT = BaseCppType('at', 'TensorOptions')
 typeAndSizeT = BaseCppType('torch::autograd::generated', 'TypeAndSize')
```

I must have accidentally added the `c10` when I was debugging, to reduce the number of byte-for-byte changes in `aten_xla_type_default.cpp`. That caused a change to `RegistrationDeclarations.yaml`, which is BC-breaking for any downstream consumers.



Updating it to ensure that RegistrationDeclarations.yaml is completely
unchanged

This reverts commit 90e532f.

Differential Revision: [D27915305](https://our.internmc.facebook.com/intern/diff/D27915305)

[ghstack-poisoned]
PIlotcnc pushed a commit to PIlotcnc/pytorch that referenced this pull request Apr 24, 2021
krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
Summary:
Pull Request resolved: pytorch#55050

not ready for review yet

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D27708346

Pulled By: bdhirsh

fbshipit-source-id: 2289edd641f30277d7561cf2d48ec69c6a2137a9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants