Structured kernel definitions by ezyang · Pull Request #45277 · pytorch/pytorch

ezyang · 2020-09-24T16:19:00Z

Stack from ghstack:

Structured kernels generate Meta registrations #48116 Structured kernels generate Meta registrations
Get TestTorch.test_empty_meta working again #48113 Get TestTorch.test_empty_meta working again
Structured kernel definitions #45277 Structured kernel definitions

Implements structured kernels as per pytorch/rfcs#9 and ports upsample_nearest1d to use the framework.

The general structure of this diff:

Define a new syntax for specifying structured kernels in native_functions.yaml. You put structured: True on the out function (that's what you implement) and structured_delegate: foo.out on the functional/inplace variants to define them in terms of the out function. There's a bunch of new consistency checking to see if you've done this right, though the error messages are of varying quality. This is most of what's going on in tools.codegen.model
NativeFunctionGroup turns into StructuredNativeFunctions. Previously I thought that maybe we would use this grouping mechanism for both structured and unstructured kernels, but it turned out that Jiakai needed to make his own grouping structure. So now I've specialized it for structured kernels, which also means I get to add a bunch of invariants, like requiring structured kernels to have both a functional and an out variant. This is the lower bundle of changes in tools.codegen.model
When you make an out kernel structured, this induces us to generate a new meta function signature for you to write shape checking and output allocation code. The signatures of these is defined by tools.codegen.api.meta and generated into MetaFunctions.h. Coverage here is very bare bones and will be driven by actual operators we port as we go.
The meaty part of code generation is what we do when we have some grouped StructuredNativeFunctions. We continue to generate a wrapper per function type, but they're are a bit different as the call your meta functions, and make reference to the actual implementations in out.
Then there's a port of upsample_nearest1d; easiest to review by just looking at what the final code looks like.

Missing pieces:

Stride calculation in TensorMeta
Sufficient sanity checking for inplace/out variants
Enough rope to make TensorIterator work

This PR improves instruction counts on upsample_nearest1d because it eliminates an extra redispatch. Testing at::upsample_nearest1d(x, {10});

Functional: before 1314105, after 1150705
Out: before 915705, after 838405

These numbers may be jittered up to +-16400 (which is the difference when I tested against an unaffected operator at::upsample_linear1d), though that may also because unrelated changes affected all operators globally.

Signed-off-by: Edward Z. Yang ezyang@fb.com

Differential Revision: D24253555

See https://fb.quip.com/o4JUAuefYVP3 Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

See https://fb.quip.com/o4JUAuefYVP3 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: a43d166 Pull Request resolved: #45277

dr-ci · 2020-09-24T16:34:16Z

💊 CI failures summary and remediations

As of commit 3a1a4c8 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 70 times.

ljk53 · 2020-09-26T00:56:50Z

tools/codegen/model.py

 SchemaKind = Enum('SchemaKind', ('functional', 'inplace', 'out'))

+@dataclass(frozen=True)
+class NativeFunctionGroup:


Thanks for creating this!

Yeah, should backport this one to master ASAP

See https://fb.quip.com/o4JUAuefYVP3 This mostly follows the same structure as the proposal, though there have been some short cuts taken. It doesn't currently build because I haven't actually implemented the meta function for the function I marked as structured. Still needs a lot of work, the prototype is here just to show feasibility. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

See https://fb.quip.com/o4JUAuefYVP3 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: af1cee9 Pull Request resolved: #45277

See https://fb.quip.com/o4JUAuefYVP3 This mostly follows the same structure as the proposal, though there have been some short cuts taken. It doesn't currently build because I haven't actually implemented the meta function for the function I marked as structured. Still needs a lot of work, the prototype is here just to show feasibility. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

See https://fb.quip.com/o4JUAuefYVP3 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 7d4bf90 Pull Request resolved: #45277

See pytorch/rfcs#9 This mostly follows the same structure as the proposal, though there have been some short cuts taken. It doesn't currently build because I haven't actually implemented the meta function for the function I marked as structured. Still needs a lot of work, the prototype is here just to show feasibility. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D24253555](https://our.internmc.facebook.com/intern/diff/D24253555) [ghstack-poisoned]

See https://fb.quip.com/o4JUAuefYVP3 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: e83da6d Pull Request resolved: pytorch#45277

See pytorch/rfcs#9 This mostly follows the same structure as the proposal, though there have been some short cuts taken. It doesn't currently build because I haven't actually implemented the meta function for the function I marked as structured. Still needs a lot of work, the prototype is here just to show feasibility. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D24253555](https://our.internmc.facebook.com/intern/diff/D24253555) [ghstack-poisoned]

See https://fb.quip.com/o4JUAuefYVP3 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: de0e5be Pull Request resolved: #45277

Implements structured kernels as per pytorch/rfcs#9 and ports upsample_nearest1d to use the framework. There is a new meta api which is the calling convention for TensorMeta calculation functions. Most of the new codegen lives in structured_func; check out the RFC for an explanation of what the code looks like. Missing pieces: - Stride calculation in TensorMeta - Sufficient sanity checking for inplace/out variants - Enough rope to make TensorIterator work There's some hacks which I can work harder to unwind: - I need to get upsample_nearest1d to be registered as abstract: True in Declarations.yaml even though it has no dispatch table (as it is implicitly filled by upsample_nearest1d.out). I ended up hacking this up by just adding a new field 'abstract: True' that lets you manually override the abstractness. Better would be to just teach the codegen to fill this correctly Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D24253555](https://our.internmc.facebook.com/intern/diff/D24253555) [ghstack-poisoned]

Implements structured kernels as per pytorch/rfcs#9 and ports upsample_nearest1d to use the framework. There is a new meta api which is the calling convention for TensorMeta calculation functions. Most of the new codegen lives in structured_func; check out the RFC for an explanation of what the code looks like. Missing pieces: - Stride calculation in TensorMeta - Sufficient sanity checking for inplace/out variants - Enough rope to make TensorIterator work There's some hacks which I can work harder to unwind: - I need to get upsample_nearest1d to be registered as abstract: True in Declarations.yaml even though it has no dispatch table (as it is implicitly filled by upsample_nearest1d.out). I ended up hacking this up by just adding a new field 'abstract: True' that lets you manually override the abstractness. Better would be to just teach the codegen to fill this correctly Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: cefa99e Pull Request resolved: #45277

ljk53 · 2020-10-23T18:52:33Z

aten/src/ATen/TensorMeta.h

+
+namespace at {
+
+struct TensorMeta {


Not related to this PR - just curious what's your thoughts about how it might be evolving in the future, are we going to add more and more TensorImpl fields here, e.g. contiguous, channels_last, etc, which are more "internal" properties?

strides will subsume contiguous/channels_last, and will be enough to fully specify the output without getting into non dense tensors. There might be more extensibility here when TensorIterator comes in the mix though.

Or should we consider this class to replace the existing "collection of fields" on TensorImpl?

I'd prefer to keep them decoupled for now, to make it easier to make representational changes in one or the other.

+1, we can capture congruences later if they arise naturally (and pay their way)

ljk53 · 2020-10-23T19:05:11Z

tools/codegen/gen.py

+    const OptionalDeviceGuard device_guard(device_of({device_of}));
+"""
+
+                return f"""\


When I studied the codegen, templates like this helps me learn the connection between internal functions and the final output; but sometimes I felt seeing a little concrete output example could be more intuitive (the downside is that we need remember to update the example). Maybe including a link to the RFC above structured_func()?

bdhirsh · 2020-10-26T22:52:18Z

tools/codegen/gen.py

+            if target is Target.DECLARATION:
+                # Not necessary!  Only TypeDefault is actually used, see
+                # https://github.com/pytorch/pytorch/issues/46319
+                return ""


Maybe have subfunc return Optional[str] to fit the pattern of the other compute methods, if there's no semantic difference between empty string and None?

Mooted as we no longer gen declarations.

bhosmer

A few notes inline but this looks clean and solid to me. I'm glad it stops short of building the Common dispatch key setup described in the RFC - definitely have more worries about that than anything here 😁

bhosmer · 2020-10-27T00:43:59Z

aten/src/ATen/TensorMeta.h

+
+namespace at {
+
+struct TensorMeta {


+1, we can capture congruences later if they arise naturally (and pay their way)

bhosmer · 2020-10-27T00:50:43Z

aten/src/ATen/TensorMeta.h

+struct TensorMeta {
+  DimVector sizes;
+  // TODO: DimVector strides;
+  TensorOptions options;


Would an explicit constructor for TensorMeta spare us redundant default initialization?

Yeah, hoping to flush this and other inefficiencies out when I'm ready for benchmarking.

Have a constructor now.

bhosmer · 2020-10-27T00:52:31Z

aten/src/ATen/TensorMeta.h

+inline TensorMeta new_meta(const Tensor& self, IntArrayRef sizes) {
+  TensorMeta m;
+  m.sizes = sizes;
+  m.options = self.options();


...re the explicit constructor comment above, here e.g. I think an explicit TensorMeta m{sizes, self.options()} would express intent directly, rather than relying on the compiler to be smart about eliding redundant inits (and without really sacrificing readability)

much as I like the brevity of sizes, I think calling the parameter output_sizes might avoid some confusion here

aten/src/ATen/TensorMeta.h

bhosmer · 2020-10-27T02:26:24Z

aten/src/ATen/native/UpSampleNearest1d.cpp

+  check_dim_size(grad_output, 3, 1, full_output_size[1]);
+  check_dim_size(grad_output, 3, 2, full_output_size[2]);
+
+  return new_meta(grad_output, input_size);


The way you've factored this totally conjures the notion of canned forward and backward utility functions wrapped around an op-specific lambda like upsample_nearest1d_common_check , but sample size of 1 is probably an influence here

I hope there are opportunities for utility functions; need to do more functions to say though.

bhosmer · 2020-10-27T05:25:41Z

aten/src/ATen/native/native_functions.yaml

-  dispatch:
-    CPU: upsample_nearest1d_cpu
-    CUDA: upsample_nearest1d_cuda
+  abstract: True  # HACK


how long will this need to live?

Also - I know the rfc posits a config-by-convention world where the functional and inplace variants don't need to say anything and the whole setup is triggered by structured: True in the foo.out variant. But I wonder if we won't run into trouble when the naming convention needs to be disrupted for one reason or other. In that world we might want a field that actually specifies the structured op we want to delegate to - delegate: upsample_nearest_1d.out or something.

Among other things, that would give us the opportunity to checksum the out variant being structured: True, as opposed to silently generating code that would result in link errors 20 minutes later or whatever :)

Not 100% sure this pays its way, but it feels like the added expression of intent might be a usability win, as well as loosening up the coupling to the (limits of the) current naming convention.

I don't intend to land the PR with the abstract keyword. Need to do some major pre-surgery to get rid though.

But I wonder if we won't run into trouble when the naming convention needs to be disrupted for one reason or other.

There's no overload naming convention here; matching of signatures is done purely by looking at the type of the function, and the function name (sans overload name). That being said...

In that world we might want a field that actually specifies the structured op we want to delegate to - delegate: upsample_nearest_1d.out or something.

I'm OK with doing this. It is technically redundant information but it gives us an opportunity to give better error messages and makes the intent in YAML more clear. That being said...

opposed to silently generating code that would result in link errors 20 minutes later or whatever :)

I'm pretty sure that in the current formulation if you mismatch the signatures of two functions in the same structured block, you'll get a codegen error (because the dangling signature will fail to find the _out version).

You're absolutely right, the grouping machinery is definitely robust enough to handle all the cases I was thinking about (missing out target, multiple clusters in a single op, overload name shenanigans). So I think the only motivation that remains is the thing about improving errors based on expressed intent. [Edit: ...and readability, in the sense of knowing at a functional or inplace kernel's declaration site whether the generated code will be delegating to a structured kernel or not.]

@bhosmer There are also more serious changes we can make to how we setup native_functions.yaml; for example, we can force the user to group related signatures. This actually hearkens to my original original proposal (https://fb.quip.com/Z7PYApb0uqwR#VBbACAwaTQ4 FB-only sorry), but people seemed allergic to it the first time around.

people seemed allergic to it the first time around.

Do you remember what the counterarguments were? I always thought grouping variants together was a good idea - not just for readability in the YAML but also to spare the intake logic from performing exactly that grouping logic (plus sanity checks etc.) behind the scenes.

Now that structured kernels will be adding an explicit intra-group relationship, having a declaration syntax that puts members together makes even more sense to me.

I guess one possible issue is if different variants support different dispatch keys. Does that ever happen?

For the record, the new version of the patch doesn't group native_functions.yaml, but has structured and structured_delegate to make the structure here more explicit

bhosmer · 2020-10-27T05:42:30Z

tools/codegen/api/meta.py

+#     the tensor in question
+
+def name(f: FunctionSchema) -> str:
+    assert f.name.overload_name == ""


might be worth listing this in the constraints section above... but also, is it necessary? I could imagine func/in/out triads that were all in the overloaded namespace

In the one example I ported they all have overload names :) This invariant enforces that you "stripped" the overload name using signature() to get at the functional signature.

Ah right - sorry, careless read.

bhosmer · 2020-10-27T05:46:56Z

tools/codegen/api/types.py

+class MetaArgument:
+    type: str
+    name: str
+    # By fiat, meta argument functions must be on full c10 dispach


nit: dispatch

But also, I'm not sure what this means in relation to the line below..

basically, there is always a one-to-one correspondence between JIT argument (Argument) and meta argument. This is true because I am going to explicitly not support structured kernels for function signatures who cannot support use_c10_dispatcher: full

bhosmer · 2020-10-27T06:09:58Z

tools/codegen/gen.py

+                    if local.use_c10_dispatcher() is UseC10Dispatcher.full:
+                        return f'm.impl("{f.func.name}", TORCH_FN({type_name}));'
+                    else:
+                        # SIGHHHHH


nit: # TODO SIGHHHHHHH

smessmer · 2020-10-27T16:12:33Z

I'm not going to block this if you disagree with me, but I want to make sure you're aware of my concerns.

We finally cleaned up some of the old codegen mess and are in the process of cleaning up more, but I don't think that this means codegen is now ok and we can run on and add more complexity to it. The new codegen is much easier to understand, and to preserve that we should not add new features to it but we should work on removing more and more features from it until it is even easier to understand. Codegen in general is still bad.

The problem with codegen is not only that the old codegen was unreadable, but also that there's an additional system that people need to know about and understand when they want to understand how our system works. Generated code is only present after you built PyTorch, so you can't easily grep for functions, follow definitions in your IDE, or even try to find which file the function you're looking for is in. It's much easier to figure out what your code is doing when there's no codegen involved.

So far, the only software component where we really need codegen seems to be the C++ frontend, and I thought we had set the vision a while ago of removing all codegen except for the C++ frontend.

@ezyang When you started this work stream, we had talked about templated solutions that did seem to work, why did we switch back to codegen? I know the design has changed since then and our original solution wouldn't work anymore, but have you tried finding a solution without codegen for the new design?

ezyang · 2020-10-27T17:45:49Z

@smessmer I think I answered your question in the RFC at https://github.com/pytorch/rfcs/pull/9/files#diff-97e82d5fc6776ce3ff1c347db7b6344e877322602d5a4301a6e6156e489a3190R295

bhosmer · 2020-10-28T03:03:32Z

tools/codegen/gen.py

+                return ""
+            elif target is Target.DEFINITION:
+                assert dispatch is not None
+                assert g.out is not None  # sigh


I think this is already tested on line 341, no?

BTW, this gives me the opportunity to complain again about how big multiply nested functions are easy to write but hard to read 😬 as I read this PR I'm finding myself using search more than I'd like to find track down things like target - a local or param that's like 2 nesting levels and 250 lines away might as well be a global, in terms of readability.

Right now the code is consistent and disciplined, which moderates the effect. But (sorry, broken record) the worry is the future - I'm pretty sure the code you just rewrote looked something like this when freshly written (mod typing and data modeling, ofc).

Anyway I wouldn't push for a rewrite but I'll take the opportunity to tiresomely repeat that I don't know many tactics with better entropy-avoiding ROI than sticking to flat functions whose dependencies are all explicitly plumbed in as parameters.

@bhosmer here's a suggested restructuring: what if we just make classes to represent the closures in question? Then the closed over variables will be explicitly notated as self.blah and you can see what all the closed over variables are in one place.

Yeah, definitely seems like that would solve the readability/entropy worry of ambient referents equally well. I'm not sure as an idiom it's any less cumbersome to write than non-closure functions with everything explicitly passed in, but I'm definitely interested to see how it turns out, if you're down to give it a spin 😁

We fixed it in #47008

bhosmer · 2020-10-28T05:12:08Z

tools/codegen/gen.py

+
+    def group_func(g: NativeFunctionGroup) -> List[str]:
+        if g.structured():
+            return structured_func(g)


IIUC the dispatch key checks here might kick us back out with [], is that ok downstream?

Err, yes? The whole point of returning [] sometimes is that sometimes you don't have a registration for a given dispatch key

Ah got it - I was reading "unsupported" in the called function as meaning unsupported by structured kernels specifically - forgot that this gets called for literally all dispatch keys. All caught up now 😁

bhosmer · 2020-10-30T20:17:08Z

Re the new yaml syntax bakeoff - reiterating a comment above, I'd strongly pitch explicit grouping in the surface syntax, I think it's strictly better. (Which also means I'm completely retracting the 'delegate' variation - it solves a problem that doesn't exist with explicit grouping.)

In the RFC you mention "harder to ingest for people who don't care about grouping" - if you mean humans, I can't imagine any reading mode where the the grouped syntax isn't better - clearer, more informative about the way the system structures things. If you mean software, I'm pretty sure it's easier to ungroup on load than it is to group, like the main codegen script does currently.

As a minor point, I wouldn't introduce a new keyword either. Just func with an alternate syntax.

ezyang · 2020-11-11T16:32:04Z

For posterity's sake; after @bhosmer's most recent post about the syntax, we discussed various options in our meeting, and concluded that while rounding up all the syntax would make sense, it still should be treated as an orthogonal problem. So I'm going to do some version of #45277 (comment) in the short term.

Implements structured kernels as per pytorch/rfcs#9 and ports upsample_nearest1d to use the framework. There is a new meta api which is the calling convention for TensorMeta calculation functions. Most of the new codegen lives in structured_func; check out the RFC for an explanation of what the code looks like. Missing pieces: - Stride calculation in TensorMeta - Sufficient sanity checking for inplace/out variants - Enough rope to make TensorIterator work There's some hacks which I can work harder to unwind: - I need to get upsample_nearest1d to be registered as abstract: True in Declarations.yaml even though it has no dispatch table (as it is implicitly filled by upsample_nearest1d.out). I ended up hacking this up by just adding a new field 'abstract: True' that lets you manually override the abstractness. Better would be to just teach the codegen to fill this correctly Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D24253555](https://our.internmc.facebook.com/intern/diff/D24253555) [ghstack-poisoned]

Implements structured kernels as per pytorch/rfcs#9 and ports upsample_nearest1d to use the framework. There is a new meta api which is the calling convention for TensorMeta calculation functions. Most of the new codegen lives in structured_func; check out the RFC for an explanation of what the code looks like. Missing pieces: - Stride calculation in TensorMeta - Sufficient sanity checking for inplace/out variants - Enough rope to make TensorIterator work There's some hacks which I can work harder to unwind: - I need to get upsample_nearest1d to be registered as abstract: True in Declarations.yaml even though it has no dispatch table (as it is implicitly filled by upsample_nearest1d.out). I ended up hacking this up by just adding a new field 'abstract: True' that lets you manually override the abstractness. Better would be to just teach the codegen to fill this correctly Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 190986c Pull Request resolved: #45277

Implements structured kernels as per pytorch/rfcs#9 and ports upsample_nearest1d to use the framework. The general structure of this diff: - Define a new syntax for specifying structured kernels in `native_functions.yaml`. You put `structured: True` on the `out` function (that's what you implement) and `structured_delegate: foo.out` on the functional/inplace variants to define them in terms of the `out` function. There's a bunch of new consistency checking to see if you've done this right, though the error messages are of varying quality. This is most of what's going on in tools.codegen.model - NativeFunctionGroup turns into StructuredNativeFunctions. Previously I thought that maybe we would use this grouping mechanism for both structured and unstructured kernels, but it turned out that Jiakai needed to make his own grouping structure. So now I've specialized it for structured kernels, which also means I get to add a bunch of invariants, like requiring structured kernels to have both a functional and an out variant. This is the lower bundle of changes in tools.codegen.model - When you make an out kernel structured, this induces us to generate a new meta function signature for you to write shape checking and output allocation code. The signatures of these is defined by `tools.codegen.api.meta` and generated into `MetaFunctions.h`. Coverage here is very bare bones and will be driven by actual operators we port as we go. - The meaty part of code generation is what we do when we have some grouped StructuredNativeFunctions. We continue to generate a wrapper per function type, but they're are a bit different as the call your meta functions, and make reference to the actual implementations in out. - Then there's a port of `upsample_nearest1d`; easiest to review by just looking at what the final code looks like. Missing pieces: - Stride calculation in TensorMeta - Sufficient sanity checking for inplace/out variants - Enough rope to make TensorIterator work Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D24253555](https://our.internmc.facebook.com/intern/diff/D24253555) [ghstack-poisoned]

ezyang · 2020-11-12T04:50:16Z

@bhosmer I fixed up all the hacks. The structure changed quite a bit because a lot of preliminary refactoring got moved earlier. I updated the PR description. I'm going to move towards landing but take a look at the new version.

Implements structured kernels as per pytorch/rfcs#9 and ports upsample_nearest1d to use the framework. The general structure of this diff: - Define a new syntax for specifying structured kernels in `native_functions.yaml`. You put `structured: True` on the `out` function (that's what you implement) and `structured_delegate: foo.out` on the functional/inplace variants to define them in terms of the `out` function. There's a bunch of new consistency checking to see if you've done this right, though the error messages are of varying quality. This is most of what's going on in tools.codegen.model - NativeFunctionGroup turns into StructuredNativeFunctions. Previously I thought that maybe we would use this grouping mechanism for both structured and unstructured kernels, but it turned out that Jiakai needed to make his own grouping structure. So now I've specialized it for structured kernels, which also means I get to add a bunch of invariants, like requiring structured kernels to have both a functional and an out variant. This is the lower bundle of changes in tools.codegen.model - When you make an out kernel structured, this induces us to generate a new meta function signature for you to write shape checking and output allocation code. The signatures of these is defined by `tools.codegen.api.meta` and generated into `MetaFunctions.h`. Coverage here is very bare bones and will be driven by actual operators we port as we go. - The meaty part of code generation is what we do when we have some grouped StructuredNativeFunctions. We continue to generate a wrapper per function type, but they're are a bit different as the call your meta functions, and make reference to the actual implementations in out. - Then there's a port of `upsample_nearest1d`; easiest to review by just looking at what the final code looks like. Missing pieces: - Stride calculation in TensorMeta - Sufficient sanity checking for inplace/out variants - Enough rope to make TensorIterator work Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D24253555](https://our.internmc.facebook.com/intern/diff/D24253555) [ghstack-poisoned]

Implements structured kernels as per pytorch/rfcs#9 and ports upsample_nearest1d to use the framework. There is a new meta api which is the calling convention for TensorMeta calculation functions. Most of the new codegen lives in structured_func; check out the RFC for an explanation of what the code looks like. Missing pieces: - Stride calculation in TensorMeta - Sufficient sanity checking for inplace/out variants - Enough rope to make TensorIterator work There's some hacks which I can work harder to unwind: - I need to get upsample_nearest1d to be registered as abstract: True in Declarations.yaml even though it has no dispatch table (as it is implicitly filled by upsample_nearest1d.out). I ended up hacking this up by just adding a new field 'abstract: True' that lets you manually override the abstractness. Better would be to just teach the codegen to fill this correctly Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: afc9a62 Pull Request resolved: #45277

bhosmer · 2020-11-12T05:45:43Z

tools/codegen/model.py

        out = d.get(SchemaKind.out)
-        return NativeFunctionGroup(
+        if functional is None or out is None or not out.structured:
+            return None


Should it be an error to specify out as structured without declaring a functional variant? (Or not an error but an assert, if it's already been checked upstream)

Don't know if there are callers that need this to be permissive, but e.g. the code path that runs through flatten_pre_group() looks like it might benefit from error checking.

It's checked separately, in validate_unstructured. I didn't put error tests here because I don't have error context available.

bhosmer · 2020-11-12T05:46:15Z

tools/codegen/model.py

    # defined.  This is for conveniently reporting error messages!
    loc: 'Location'

+    # Whether or not this out functions is a "structured kernel".  Structured


nit: function

bhosmer · 2020-11-12T06:06:40Z

tools/codegen/model.py

+
+    # Whether or not this non-out function is a structured kernel, defined
+    # in terms of the out kernel referenced by the string here.
+    structured_delegate: Optional['OperatorName']


Data modeling OCD obliges me to note the room for nonsense in having both structured and structured_delegate floating around in the representation 😁

In a perfect world we'd have a single property whose type depended on func.kind, but in the absence of that, there's no great choice (assuming making SchemaKind into an ADT that carries this information directly is either impossible in mypy, or unwanted for some other reason).

In the vein of "tighten the representation and provide derived properties for convenience" here's a half-hearted pitch for

structured: Optional['OperatorName']

and out functions just use '' or whatever and is_structured() just tests is not None. get_structured_delegate() could do the obvious.

Yeah, this choice was driven by what I wanted surface syntax to look like, and then trying to keep NativeFunction class looking similar. What do you think is clearer:

- func: foo.out structured: None - func: foo.Tensor structured: foo.out

or

- func: foo.out structured: true - func: foo.Tensor structured_delegate: foo.out

#2. I think "structured: None means it's structured" would be hella confusing. (More generally I think it's better in the surface syntax to have the attributes be, like, monomorphic :)

Matching representation to surface syntax is a nice through line - might be worth a one-liner in the comment header for NativeFunction (apologies if it's there already and I missed it)

bhosmer · 2020-11-12T06:09:31Z

@bhosmer I fixed up all the hacks.

Looks great! couple drive-by observations but nothing big

Implements structured kernels as per pytorch/rfcs#9 and ports upsample_nearest1d to use the framework. The general structure of this diff: - Define a new syntax for specifying structured kernels in `native_functions.yaml`. You put `structured: True` on the `out` function (that's what you implement) and `structured_delegate: foo.out` on the functional/inplace variants to define them in terms of the `out` function. There's a bunch of new consistency checking to see if you've done this right, though the error messages are of varying quality. This is most of what's going on in tools.codegen.model - NativeFunctionGroup turns into StructuredNativeFunctions. Previously I thought that maybe we would use this grouping mechanism for both structured and unstructured kernels, but it turned out that Jiakai needed to make his own grouping structure. So now I've specialized it for structured kernels, which also means I get to add a bunch of invariants, like requiring structured kernels to have both a functional and an out variant. This is the lower bundle of changes in tools.codegen.model - When you make an out kernel structured, this induces us to generate a new meta function signature for you to write shape checking and output allocation code. The signatures of these is defined by `tools.codegen.api.meta` and generated into `MetaFunctions.h`. Coverage here is very bare bones and will be driven by actual operators we port as we go. - The meaty part of code generation is what we do when we have some grouped StructuredNativeFunctions. We continue to generate a wrapper per function type, but they're are a bit different as the call your meta functions, and make reference to the actual implementations in out. - Then there's a port of `upsample_nearest1d`; easiest to review by just looking at what the final code looks like. Missing pieces: - Stride calculation in TensorMeta - Sufficient sanity checking for inplace/out variants - Enough rope to make TensorIterator work Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D24253555](https://our.internmc.facebook.com/intern/diff/D24253555) [ghstack-poisoned]

Implements structured kernels as per pytorch/rfcs#9 and ports upsample_nearest1d to use the framework. There is a new meta api which is the calling convention for TensorMeta calculation functions. Most of the new codegen lives in structured_func; check out the RFC for an explanation of what the code looks like. Missing pieces: - Stride calculation in TensorMeta - Sufficient sanity checking for inplace/out variants - Enough rope to make TensorIterator work There's some hacks which I can work harder to unwind: - I need to get upsample_nearest1d to be registered as abstract: True in Declarations.yaml even though it has no dispatch table (as it is implicitly filled by upsample_nearest1d.out). I ended up hacking this up by just adding a new field 'abstract: True' that lets you manually override the abstractness. Better would be to just teach the codegen to fill this correctly Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 84a84ab Pull Request resolved: #45277

facebook-github-bot · 2020-11-18T01:13:22Z

@ezyang merged this pull request in cdc2d28.

[PROTOTYPE] Structured kernel definitions

c351469

See https://fb.quip.com/o4JUAuefYVP3 Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

This was referenced Sep 24, 2020

Switch all Sequences in tools.codegen.model to Tuple #45127

Closed

Add NativeFunction.signature and kind. #45131

Closed

ezyang added a commit that referenced this pull request Sep 24, 2020

[PROTOTYPE] Structured kernel definitions

7ce3be2

See https://fb.quip.com/o4JUAuefYVP3 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: a43d166 Pull Request resolved: #45277

ljk53 reviewed Sep 26, 2020

View reviewed changes

ezyang added a commit that referenced this pull request Oct 9, 2020

[PROTOTYPE] Structured kernel definitions

ebfc2bf

See https://fb.quip.com/o4JUAuefYVP3 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: af1cee9 Pull Request resolved: #45277

ezyang added a commit that referenced this pull request Oct 12, 2020

[PROTOTYPE] Structured kernel definitions

acf2aed

See https://fb.quip.com/o4JUAuefYVP3 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 7d4bf90 Pull Request resolved: #45277

fmassa mentioned this pull request Oct 13, 2020

Port all C++ ops to use the dispatcher pytorch/vision#2796

Closed

6 tasks

ezyang added a commit to ezyang/pytorch that referenced this pull request Oct 14, 2020

[PROTOTYPE] Structured kernel definitions

d763a1e

See https://fb.quip.com/o4JUAuefYVP3 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: e83da6d Pull Request resolved: pytorch#45277

ezyang added a commit that referenced this pull request Oct 14, 2020

[PROTOTYPE] Structured kernel definitions

885b55b

See https://fb.quip.com/o4JUAuefYVP3 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: de0e5be Pull Request resolved: #45277

ezyang changed the title ~~[PROTOTYPE] Structured kernel definitions~~ Structured kernel definitions Oct 22, 2020

ezyang requested review from ailzhang, bdhirsh, bhosmer and smessmer October 23, 2020 17:49

ljk53 reviewed Oct 23, 2020

View reviewed changes

bdhirsh reviewed Oct 26, 2020

View reviewed changes

bhosmer approved these changes Oct 27, 2020

View reviewed changes

bhosmer reviewed Oct 28, 2020

View reviewed changes

facebook-github-bot added the cla signed label Oct 30, 2020

ezyang mentioned this pull request Nov 11, 2020

ATen DerivedType is dead, long live ATen RegisterDispatchKey #47011

Closed

bhosmer reviewed Nov 12, 2020

View reviewed changes

This was referenced Nov 17, 2020

Get TestTorch.test_empty_meta working again #48113

Closed

Structured kernels generate Meta registrations #48116

Closed

facebook-github-bot closed this in cdc2d28 Nov 17, 2020

facebook-github-bot added the Merged label Nov 18, 2020

facebook-github-bot deleted the gh/ezyang/846/head branch November 21, 2020 15:15

Conversation

ezyang commented Sep 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-ci bot commented Sep 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bhosmer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bhosmer Oct 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smessmer commented Oct 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Oct 27, 2020

ezyang commented Sep 24, 2020 •

edited

Loading

dr-ci bot commented Sep 24, 2020 •

edited

Loading

bhosmer Oct 28, 2020 •

edited

Loading

smessmer commented Oct 27, 2020 •

edited

Loading

bhosmer Nov 12, 2020 •

edited

Loading