Structured kernel definitions#45277
Conversation
See https://fb.quip.com/o4JUAuefYVP3 Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]
See https://fb.quip.com/o4JUAuefYVP3 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: a43d166 Pull Request resolved: #45277
💊 CI failures summary and remediationsAs of commit 3a1a4c8 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 70 times. |
tools/codegen/model.py
Outdated
| SchemaKind = Enum('SchemaKind', ('functional', 'inplace', 'out')) | ||
|
|
||
| @dataclass(frozen=True) | ||
| class NativeFunctionGroup: |
There was a problem hiding this comment.
Yeah, should backport this one to master ASAP
See https://fb.quip.com/o4JUAuefYVP3 This mostly follows the same structure as the proposal, though there have been some short cuts taken. It doesn't currently build because I haven't actually implemented the meta function for the function I marked as structured. Still needs a lot of work, the prototype is here just to show feasibility. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]
See https://fb.quip.com/o4JUAuefYVP3 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: af1cee9 Pull Request resolved: #45277
See https://fb.quip.com/o4JUAuefYVP3 This mostly follows the same structure as the proposal, though there have been some short cuts taken. It doesn't currently build because I haven't actually implemented the meta function for the function I marked as structured. Still needs a lot of work, the prototype is here just to show feasibility. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]
See https://fb.quip.com/o4JUAuefYVP3 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 7d4bf90 Pull Request resolved: #45277
See pytorch/rfcs#9 This mostly follows the same structure as the proposal, though there have been some short cuts taken. It doesn't currently build because I haven't actually implemented the meta function for the function I marked as structured. Still needs a lot of work, the prototype is here just to show feasibility. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D24253555](https://our.internmc.facebook.com/intern/diff/D24253555) [ghstack-poisoned]
See https://fb.quip.com/o4JUAuefYVP3 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: e83da6d Pull Request resolved: pytorch#45277
See pytorch/rfcs#9 This mostly follows the same structure as the proposal, though there have been some short cuts taken. It doesn't currently build because I haven't actually implemented the meta function for the function I marked as structured. Still needs a lot of work, the prototype is here just to show feasibility. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D24253555](https://our.internmc.facebook.com/intern/diff/D24253555) [ghstack-poisoned]
See https://fb.quip.com/o4JUAuefYVP3 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: de0e5be Pull Request resolved: #45277
Implements structured kernels as per pytorch/rfcs#9 and ports upsample_nearest1d to use the framework. There is a new meta api which is the calling convention for TensorMeta calculation functions. Most of the new codegen lives in structured_func; check out the RFC for an explanation of what the code looks like. Missing pieces: - Stride calculation in TensorMeta - Sufficient sanity checking for inplace/out variants - Enough rope to make TensorIterator work There's some hacks which I can work harder to unwind: - I need to get upsample_nearest1d to be registered as abstract: True in Declarations.yaml even though it has no dispatch table (as it is implicitly filled by upsample_nearest1d.out). I ended up hacking this up by just adding a new field 'abstract: True' that lets you manually override the abstractness. Better would be to just teach the codegen to fill this correctly Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D24253555](https://our.internmc.facebook.com/intern/diff/D24253555) [ghstack-poisoned]
Implements structured kernels as per pytorch/rfcs#9 and ports upsample_nearest1d to use the framework. There is a new meta api which is the calling convention for TensorMeta calculation functions. Most of the new codegen lives in structured_func; check out the RFC for an explanation of what the code looks like. Missing pieces: - Stride calculation in TensorMeta - Sufficient sanity checking for inplace/out variants - Enough rope to make TensorIterator work There's some hacks which I can work harder to unwind: - I need to get upsample_nearest1d to be registered as abstract: True in Declarations.yaml even though it has no dispatch table (as it is implicitly filled by upsample_nearest1d.out). I ended up hacking this up by just adding a new field 'abstract: True' that lets you manually override the abstractness. Better would be to just teach the codegen to fill this correctly Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: cefa99e Pull Request resolved: #45277
|
|
||
| namespace at { | ||
|
|
||
| struct TensorMeta { |
There was a problem hiding this comment.
Not related to this PR - just curious what's your thoughts about how it might be evolving in the future, are we going to add more and more TensorImpl fields here, e.g. contiguous, channels_last, etc, which are more "internal" properties?
There was a problem hiding this comment.
strides will subsume contiguous/channels_last, and will be enough to fully specify the output without getting into non dense tensors. There might be more extensibility here when TensorIterator comes in the mix though.
There was a problem hiding this comment.
Or should we consider this class to replace the existing "collection of fields" on TensorImpl?
There was a problem hiding this comment.
I'd prefer to keep them decoupled for now, to make it easier to make representational changes in one or the other.
There was a problem hiding this comment.
+1, we can capture congruences later if they arise naturally (and pay their way)
tools/codegen/gen.py
Outdated
| const OptionalDeviceGuard device_guard(device_of({device_of})); | ||
| """ | ||
|
|
||
| return f"""\ |
There was a problem hiding this comment.
When I studied the codegen, templates like this helps me learn the connection between internal functions and the final output; but sometimes I felt seeing a little concrete output example could be more intuitive (the downside is that we need remember to update the example). Maybe including a link to the RFC above structured_func()?
tools/codegen/gen.py
Outdated
| if target is Target.DECLARATION: | ||
| # Not necessary! Only TypeDefault is actually used, see | ||
| # https://github.com/pytorch/pytorch/issues/46319 | ||
| return "" |
There was a problem hiding this comment.
Maybe have subfunc return Optional[str] to fit the pattern of the other compute methods, if there's no semantic difference between empty string and None?
There was a problem hiding this comment.
Mooted as we no longer gen declarations.
bhosmer
left a comment
There was a problem hiding this comment.
A few notes inline but this looks clean and solid to me. I'm glad it stops short of building the Common dispatch key setup described in the RFC - definitely have more worries about that than anything here 😁
|
|
||
| namespace at { | ||
|
|
||
| struct TensorMeta { |
There was a problem hiding this comment.
+1, we can capture congruences later if they arise naturally (and pay their way)
| struct TensorMeta { | ||
| DimVector sizes; | ||
| // TODO: DimVector strides; | ||
| TensorOptions options; |
There was a problem hiding this comment.
Would an explicit constructor for TensorMeta spare us redundant default initialization?
There was a problem hiding this comment.
Yeah, hoping to flush this and other inefficiencies out when I'm ready for benchmarking.
There was a problem hiding this comment.
Have a constructor now.
aten/src/ATen/TensorMeta.h
Outdated
| inline TensorMeta new_meta(const Tensor& self, IntArrayRef sizes) { | ||
| TensorMeta m; | ||
| m.sizes = sizes; | ||
| m.options = self.options(); |
There was a problem hiding this comment.
...re the explicit constructor comment above, here e.g. I think an explicit TensorMeta m{sizes, self.options()} would express intent directly, rather than relying on the compiler to be smart about eliding redundant inits (and without really sacrificing readability)
There was a problem hiding this comment.
much as I like the brevity of sizes, I think calling the parameter output_sizes might avoid some confusion here
| check_dim_size(grad_output, 3, 1, full_output_size[1]); | ||
| check_dim_size(grad_output, 3, 2, full_output_size[2]); | ||
|
|
||
| return new_meta(grad_output, input_size); |
There was a problem hiding this comment.
The way you've factored this totally conjures the notion of canned forward and backward utility functions wrapped around an op-specific lambda like upsample_nearest1d_common_check , but sample size of 1 is probably an influence here
There was a problem hiding this comment.
I hope there are opportunities for utility functions; need to do more functions to say though.
| dispatch: | ||
| CPU: upsample_nearest1d_cpu | ||
| CUDA: upsample_nearest1d_cuda | ||
| abstract: True # HACK |
There was a problem hiding this comment.
Also - I know the rfc posits a config-by-convention world where the functional and inplace variants don't need to say anything and the whole setup is triggered by structured: True in the foo.out variant. But I wonder if we won't run into trouble when the naming convention needs to be disrupted for one reason or other. In that world we might want a field that actually specifies the structured op we want to delegate to - delegate: upsample_nearest_1d.out or something.
Among other things, that would give us the opportunity to checksum the out variant being structured: True, as opposed to silently generating code that would result in link errors 20 minutes later or whatever :)
Not 100% sure this pays its way, but it feels like the added expression of intent might be a usability win, as well as loosening up the coupling to the (limits of the) current naming convention.
There was a problem hiding this comment.
I don't intend to land the PR with the abstract keyword. Need to do some major pre-surgery to get rid though.
But I wonder if we won't run into trouble when the naming convention needs to be disrupted for one reason or other.
There's no overload naming convention here; matching of signatures is done purely by looking at the type of the function, and the function name (sans overload name). That being said...
In that world we might want a field that actually specifies the structured op we want to delegate to - delegate: upsample_nearest_1d.out or something.
I'm OK with doing this. It is technically redundant information but it gives us an opportunity to give better error messages and makes the intent in YAML more clear. That being said...
opposed to silently generating code that would result in link errors 20 minutes later or whatever :)
I'm pretty sure that in the current formulation if you mismatch the signatures of two functions in the same structured block, you'll get a codegen error (because the dangling signature will fail to find the _out version).
There was a problem hiding this comment.
You're absolutely right, the grouping machinery is definitely robust enough to handle all the cases I was thinking about (missing out target, multiple clusters in a single op, overload name shenanigans). So I think the only motivation that remains is the thing about improving errors based on expressed intent. [Edit: ...and readability, in the sense of knowing at a functional or inplace kernel's declaration site whether the generated code will be delegating to a structured kernel or not.]
There was a problem hiding this comment.
@bhosmer There are also more serious changes we can make to how we setup native_functions.yaml; for example, we can force the user to group related signatures. This actually hearkens to my original original proposal (https://fb.quip.com/Z7PYApb0uqwR#VBbACAwaTQ4 FB-only sorry), but people seemed allergic to it the first time around.
There was a problem hiding this comment.
people seemed allergic to it the first time around.
Do you remember what the counterarguments were? I always thought grouping variants together was a good idea - not just for readability in the YAML but also to spare the intake logic from performing exactly that grouping logic (plus sanity checks etc.) behind the scenes.
Now that structured kernels will be adding an explicit intra-group relationship, having a declaration syntax that puts members together makes even more sense to me.
I guess one possible issue is if different variants support different dispatch keys. Does that ever happen?
There was a problem hiding this comment.
For the record, the new version of the patch doesn't group native_functions.yaml, but has structured and structured_delegate to make the structure here more explicit
| # the tensor in question | ||
|
|
||
| def name(f: FunctionSchema) -> str: | ||
| assert f.name.overload_name == "" |
There was a problem hiding this comment.
might be worth listing this in the constraints section above... but also, is it necessary? I could imagine func/in/out triads that were all in the overloaded namespace
There was a problem hiding this comment.
In the one example I ported they all have overload names :) This invariant enforces that you "stripped" the overload name using signature() to get at the functional signature.
tools/codegen/api/types.py
Outdated
| class MetaArgument: | ||
| type: str | ||
| name: str | ||
| # By fiat, meta argument functions must be on full c10 dispach |
There was a problem hiding this comment.
nit: dispatch
But also, I'm not sure what this means in relation to the line below..
There was a problem hiding this comment.
basically, there is always a one-to-one correspondence between JIT argument (Argument) and meta argument. This is true because I am going to explicitly not support structured kernels for function signatures who cannot support use_c10_dispatcher: full
tools/codegen/gen.py
Outdated
| if local.use_c10_dispatcher() is UseC10Dispatcher.full: | ||
| return f'm.impl("{f.func.name}", TORCH_FN({type_name}));' | ||
| else: | ||
| # SIGHHHHH |
|
I'm not going to block this if you disagree with me, but I want to make sure you're aware of my concerns. We finally cleaned up some of the old codegen mess and are in the process of cleaning up more, but I don't think that this means codegen is now ok and we can run on and add more complexity to it. The new codegen is much easier to understand, and to preserve that we should not add new features to it but we should work on removing more and more features from it until it is even easier to understand. Codegen in general is still bad. The problem with codegen is not only that the old codegen was unreadable, but also that there's an additional system that people need to know about and understand when they want to understand how our system works. Generated code is only present after you built PyTorch, so you can't easily grep for functions, follow definitions in your IDE, or even try to find which file the function you're looking for is in. It's much easier to figure out what your code is doing when there's no codegen involved. So far, the only software component where we really need codegen seems to be the C++ frontend, and I thought we had set the vision a while ago of removing all codegen except for the C++ frontend. @ezyang When you started this work stream, we had talked about templated solutions that did seem to work, why did we switch back to codegen? I know the design has changed since then and our original solution wouldn't work anymore, but have you tried finding a solution without codegen for the new design? |
|
@smessmer I think I answered your question in the RFC at https://github.com/pytorch/rfcs/pull/9/files#diff-97e82d5fc6776ce3ff1c347db7b6344e877322602d5a4301a6e6156e489a3190R295 |
tools/codegen/gen.py
Outdated
| return "" | ||
| elif target is Target.DEFINITION: | ||
| assert dispatch is not None | ||
| assert g.out is not None # sigh |
There was a problem hiding this comment.
I think this is already tested on line 341, no?
BTW, this gives me the opportunity to complain again about how big multiply nested functions are easy to write but hard to read 😬 as I read this PR I'm finding myself using search more than I'd like to find track down things like target - a local or param that's like 2 nesting levels and 250 lines away might as well be a global, in terms of readability.
Right now the code is consistent and disciplined, which moderates the effect. But (sorry, broken record) the worry is the future - I'm pretty sure the code you just rewrote looked something like this when freshly written (mod typing and data modeling, ofc).
Anyway I wouldn't push for a rewrite but I'll take the opportunity to tiresomely repeat that I don't know many tactics with better entropy-avoiding ROI than sticking to flat functions whose dependencies are all explicitly plumbed in as parameters.
There was a problem hiding this comment.
@bhosmer here's a suggested restructuring: what if we just make classes to represent the closures in question? Then the closed over variables will be explicitly notated as self.blah and you can see what all the closed over variables are in one place.
There was a problem hiding this comment.
Yeah, definitely seems like that would solve the readability/entropy worry of ambient referents equally well. I'm not sure as an idiom it's any less cumbersome to write than non-closure functions with everything explicitly passed in, but I'm definitely interested to see how it turns out, if you're down to give it a spin 😁
tools/codegen/gen.py
Outdated
|
|
||
| def group_func(g: NativeFunctionGroup) -> List[str]: | ||
| if g.structured(): | ||
| return structured_func(g) |
There was a problem hiding this comment.
IIUC the dispatch key checks here might kick us back out with [], is that ok downstream?
There was a problem hiding this comment.
Err, yes? The whole point of returning [] sometimes is that sometimes you don't have a registration for a given dispatch key
There was a problem hiding this comment.
Ah got it - I was reading "unsupported" in the called function as meaning unsupported by structured kernels specifically - forgot that this gets called for literally all dispatch keys. All caught up now 😁
|
Re the new yaml syntax bakeoff - reiterating a comment above, I'd strongly pitch explicit grouping in the surface syntax, I think it's strictly better. (Which also means I'm completely retracting the 'delegate' variation - it solves a problem that doesn't exist with explicit grouping.) In the RFC you mention "harder to ingest for people who don't care about grouping" - if you mean humans, I can't imagine any reading mode where the the grouped syntax isn't better - clearer, more informative about the way the system structures things. If you mean software, I'm pretty sure it's easier to ungroup on load than it is to group, like the main codegen script does currently. As a minor point, I wouldn't introduce a new keyword either. Just |
|
For posterity's sake; after @bhosmer's most recent post about the syntax, we discussed various options in our meeting, and concluded that while rounding up all the syntax would make sense, it still should be treated as an orthogonal problem. So I'm going to do some version of #45277 (comment) in the short term. |
Implements structured kernels as per pytorch/rfcs#9 and ports upsample_nearest1d to use the framework. There is a new meta api which is the calling convention for TensorMeta calculation functions. Most of the new codegen lives in structured_func; check out the RFC for an explanation of what the code looks like. Missing pieces: - Stride calculation in TensorMeta - Sufficient sanity checking for inplace/out variants - Enough rope to make TensorIterator work There's some hacks which I can work harder to unwind: - I need to get upsample_nearest1d to be registered as abstract: True in Declarations.yaml even though it has no dispatch table (as it is implicitly filled by upsample_nearest1d.out). I ended up hacking this up by just adding a new field 'abstract: True' that lets you manually override the abstractness. Better would be to just teach the codegen to fill this correctly Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D24253555](https://our.internmc.facebook.com/intern/diff/D24253555) [ghstack-poisoned]
Implements structured kernels as per pytorch/rfcs#9 and ports upsample_nearest1d to use the framework. There is a new meta api which is the calling convention for TensorMeta calculation functions. Most of the new codegen lives in structured_func; check out the RFC for an explanation of what the code looks like. Missing pieces: - Stride calculation in TensorMeta - Sufficient sanity checking for inplace/out variants - Enough rope to make TensorIterator work There's some hacks which I can work harder to unwind: - I need to get upsample_nearest1d to be registered as abstract: True in Declarations.yaml even though it has no dispatch table (as it is implicitly filled by upsample_nearest1d.out). I ended up hacking this up by just adding a new field 'abstract: True' that lets you manually override the abstractness. Better would be to just teach the codegen to fill this correctly Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 190986c Pull Request resolved: #45277
Implements structured kernels as per pytorch/rfcs#9 and ports upsample_nearest1d to use the framework. The general structure of this diff: - Define a new syntax for specifying structured kernels in `native_functions.yaml`. You put `structured: True` on the `out` function (that's what you implement) and `structured_delegate: foo.out` on the functional/inplace variants to define them in terms of the `out` function. There's a bunch of new consistency checking to see if you've done this right, though the error messages are of varying quality. This is most of what's going on in tools.codegen.model - NativeFunctionGroup turns into StructuredNativeFunctions. Previously I thought that maybe we would use this grouping mechanism for both structured and unstructured kernels, but it turned out that Jiakai needed to make his own grouping structure. So now I've specialized it for structured kernels, which also means I get to add a bunch of invariants, like requiring structured kernels to have both a functional and an out variant. This is the lower bundle of changes in tools.codegen.model - When you make an out kernel structured, this induces us to generate a new meta function signature for you to write shape checking and output allocation code. The signatures of these is defined by `tools.codegen.api.meta` and generated into `MetaFunctions.h`. Coverage here is very bare bones and will be driven by actual operators we port as we go. - The meaty part of code generation is what we do when we have some grouped StructuredNativeFunctions. We continue to generate a wrapper per function type, but they're are a bit different as the call your meta functions, and make reference to the actual implementations in out. - Then there's a port of `upsample_nearest1d`; easiest to review by just looking at what the final code looks like. Missing pieces: - Stride calculation in TensorMeta - Sufficient sanity checking for inplace/out variants - Enough rope to make TensorIterator work Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D24253555](https://our.internmc.facebook.com/intern/diff/D24253555) [ghstack-poisoned]
|
@bhosmer I fixed up all the hacks. The structure changed quite a bit because a lot of preliminary refactoring got moved earlier. I updated the PR description. I'm going to move towards landing but take a look at the new version. |
Implements structured kernels as per pytorch/rfcs#9 and ports upsample_nearest1d to use the framework. The general structure of this diff: - Define a new syntax for specifying structured kernels in `native_functions.yaml`. You put `structured: True` on the `out` function (that's what you implement) and `structured_delegate: foo.out` on the functional/inplace variants to define them in terms of the `out` function. There's a bunch of new consistency checking to see if you've done this right, though the error messages are of varying quality. This is most of what's going on in tools.codegen.model - NativeFunctionGroup turns into StructuredNativeFunctions. Previously I thought that maybe we would use this grouping mechanism for both structured and unstructured kernels, but it turned out that Jiakai needed to make his own grouping structure. So now I've specialized it for structured kernels, which also means I get to add a bunch of invariants, like requiring structured kernels to have both a functional and an out variant. This is the lower bundle of changes in tools.codegen.model - When you make an out kernel structured, this induces us to generate a new meta function signature for you to write shape checking and output allocation code. The signatures of these is defined by `tools.codegen.api.meta` and generated into `MetaFunctions.h`. Coverage here is very bare bones and will be driven by actual operators we port as we go. - The meaty part of code generation is what we do when we have some grouped StructuredNativeFunctions. We continue to generate a wrapper per function type, but they're are a bit different as the call your meta functions, and make reference to the actual implementations in out. - Then there's a port of `upsample_nearest1d`; easiest to review by just looking at what the final code looks like. Missing pieces: - Stride calculation in TensorMeta - Sufficient sanity checking for inplace/out variants - Enough rope to make TensorIterator work Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D24253555](https://our.internmc.facebook.com/intern/diff/D24253555) [ghstack-poisoned]
Implements structured kernels as per pytorch/rfcs#9 and ports upsample_nearest1d to use the framework. There is a new meta api which is the calling convention for TensorMeta calculation functions. Most of the new codegen lives in structured_func; check out the RFC for an explanation of what the code looks like. Missing pieces: - Stride calculation in TensorMeta - Sufficient sanity checking for inplace/out variants - Enough rope to make TensorIterator work There's some hacks which I can work harder to unwind: - I need to get upsample_nearest1d to be registered as abstract: True in Declarations.yaml even though it has no dispatch table (as it is implicitly filled by upsample_nearest1d.out). I ended up hacking this up by just adding a new field 'abstract: True' that lets you manually override the abstractness. Better would be to just teach the codegen to fill this correctly Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: afc9a62 Pull Request resolved: #45277
| out = d.get(SchemaKind.out) | ||
| return NativeFunctionGroup( | ||
| if functional is None or out is None or not out.structured: | ||
| return None |
There was a problem hiding this comment.
Should it be an error to specify out as structured without declaring a functional variant? (Or not an error but an assert, if it's already been checked upstream)
Don't know if there are callers that need this to be permissive, but e.g. the code path that runs through flatten_pre_group() looks like it might benefit from error checking.
There was a problem hiding this comment.
It's checked separately, in validate_unstructured. I didn't put error tests here because I don't have error context available.
| # defined. This is for conveniently reporting error messages! | ||
| loc: 'Location' | ||
|
|
||
| # Whether or not this out functions is a "structured kernel". Structured |
|
|
||
| # Whether or not this non-out function is a structured kernel, defined | ||
| # in terms of the out kernel referenced by the string here. | ||
| structured_delegate: Optional['OperatorName'] |
There was a problem hiding this comment.
Data modeling OCD obliges me to note the room for nonsense in having both structured and structured_delegate floating around in the representation 😁
In a perfect world we'd have a single property whose type depended on func.kind, but in the absence of that, there's no great choice (assuming making SchemaKind into an ADT that carries this information directly is either impossible in mypy, or unwanted for some other reason).
In the vein of "tighten the representation and provide derived properties for convenience" here's a half-hearted pitch for
structured: Optional['OperatorName']
and out functions just use '' or whatever and is_structured() just tests is not None. get_structured_delegate() could do the obvious.
There was a problem hiding this comment.
Yeah, this choice was driven by what I wanted surface syntax to look like, and then trying to keep NativeFunction class looking similar. What do you think is clearer:
- func: foo.out
structured: None
- func: foo.Tensor
structured: foo.out
or
- func: foo.out
structured: true
- func: foo.Tensor
structured_delegate: foo.out
There was a problem hiding this comment.
#2. I think "structured: None means it's structured" would be hella confusing. (More generally I think it's better in the surface syntax to have the attributes be, like, monomorphic :)
Matching representation to surface syntax is a nice through line - might be worth a one-liner in the comment header for NativeFunction (apologies if it's there already and I missed it)
Looks great! couple drive-by observations but nothing big |
Implements structured kernels as per pytorch/rfcs#9 and ports upsample_nearest1d to use the framework. The general structure of this diff: - Define a new syntax for specifying structured kernels in `native_functions.yaml`. You put `structured: True` on the `out` function (that's what you implement) and `structured_delegate: foo.out` on the functional/inplace variants to define them in terms of the `out` function. There's a bunch of new consistency checking to see if you've done this right, though the error messages are of varying quality. This is most of what's going on in tools.codegen.model - NativeFunctionGroup turns into StructuredNativeFunctions. Previously I thought that maybe we would use this grouping mechanism for both structured and unstructured kernels, but it turned out that Jiakai needed to make his own grouping structure. So now I've specialized it for structured kernels, which also means I get to add a bunch of invariants, like requiring structured kernels to have both a functional and an out variant. This is the lower bundle of changes in tools.codegen.model - When you make an out kernel structured, this induces us to generate a new meta function signature for you to write shape checking and output allocation code. The signatures of these is defined by `tools.codegen.api.meta` and generated into `MetaFunctions.h`. Coverage here is very bare bones and will be driven by actual operators we port as we go. - The meaty part of code generation is what we do when we have some grouped StructuredNativeFunctions. We continue to generate a wrapper per function type, but they're are a bit different as the call your meta functions, and make reference to the actual implementations in out. - Then there's a port of `upsample_nearest1d`; easiest to review by just looking at what the final code looks like. Missing pieces: - Stride calculation in TensorMeta - Sufficient sanity checking for inplace/out variants - Enough rope to make TensorIterator work Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D24253555](https://our.internmc.facebook.com/intern/diff/D24253555) [ghstack-poisoned]
Implements structured kernels as per pytorch/rfcs#9 and ports upsample_nearest1d to use the framework. There is a new meta api which is the calling convention for TensorMeta calculation functions. Most of the new codegen lives in structured_func; check out the RFC for an explanation of what the code looks like. Missing pieces: - Stride calculation in TensorMeta - Sufficient sanity checking for inplace/out variants - Enough rope to make TensorIterator work There's some hacks which I can work harder to unwind: - I need to get upsample_nearest1d to be registered as abstract: True in Declarations.yaml even though it has no dispatch table (as it is implicitly filled by upsample_nearest1d.out). I ended up hacking this up by just adding a new field 'abstract: True' that lets you manually override the abstractness. Better would be to just teach the codegen to fill this correctly Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 84a84ab Pull Request resolved: #45277
Stack from ghstack:
Implements structured kernels as per pytorch/rfcs#9 and ports upsample_nearest1d to use the framework.
The general structure of this diff:
native_functions.yaml. You putstructured: Trueon theoutfunction (that's what you implement) andstructured_delegate: foo.outon the functional/inplace variants to define them in terms of theoutfunction. There's a bunch of new consistency checking to see if you've done this right, though the error messages are of varying quality. This is most of what's going on in tools.codegen.modeltools.codegen.api.metaand generated intoMetaFunctions.h. Coverage here is very bare bones and will be driven by actual operators we port as we go.upsample_nearest1d; easiest to review by just looking at what the final code looks like.Missing pieces:
This PR improves instruction counts on
upsample_nearest1dbecause it eliminates an extra redispatch. Testingat::upsample_nearest1d(x, {10});These numbers may be jittered up to +-16400 (which is the difference when I tested against an unaffected operator
at::upsample_linear1d), though that may also because unrelated changes affected all operators globally.Signed-off-by: Edward Z. Yang ezyang@fb.com
Differential Revision: D24253555