Skip to content

Move argument grouping into FunctionSchema#48195

Closed
ezyang wants to merge 5 commits intogh/ezyang/869/basefrom
gh/ezyang/869/head
Closed

Move argument grouping into FunctionSchema#48195
ezyang wants to merge 5 commits intogh/ezyang/869/basefrom
gh/ezyang/869/head

Conversation

@ezyang
Copy link
Copy Markdown
Contributor

@ezyang ezyang commented Nov 18, 2020

Stack from ghstack:

The general approach is to change Arguments, splitting positional, kwarg_only and out, into pre_self_positional, self_arg, post_self_positional, and pre_tensor_options_kwarg_only, tensor_options and post_tensor_options_kwarg_only. The splits are as you'd expect: we extract out the self argument and the tensor options arguments, and record the other arguments that came before and after. To do this, we move the logic in group_arguments to the parsing process.

Some fuzz in the process:

  • I renamed ThisArgument to SelfArgument, since we don't actually use the terminology "this" outside of C++ (and the model is Python biased)
  • I kept the group_arguments function, which now just reads out the arguments from the structured model in the correct order. In the long term, we should get rid of this function entirely, but for now I kept it as is to reduce churn.
  • I decided to arbitrarily say that when self is missing, everything goes in "post-self", but when tensor options is missing, everything goes in "pre-tensor-options". This was based on where you typically find the argument in question: self is usually at front (so most args are after it), while tensor options are typically at the end (so most args go before it).

Signed-off-by: Edward Z. Yang ezyang@fb.com

Differential Revision: D25231166

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Nov 18, 2020

sorry short on time, no description; coming soon

@dr-ci
Copy link
Copy Markdown

dr-ci bot commented Nov 18, 2020

💊 CI failures summary and remediations

As of commit 00c8118 (more details on the Dr. CI page):



🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_coverage_test2 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Dec 02 04:56:20 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Dec 02 04:56:20 At: 
Dec 02 04:56:20   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Dec 02 04:56:20   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Dec 02 04:56:20  
Dec 02 04:56:20 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future 
Dec 02 04:56:20  
Dec 02 04:56:20 At: 
Dec 02 04:56:20   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Dec 02 04:56:20   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Dec 02 04:56:20  
Dec 02 04:56:20 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future 
Dec 02 04:56:20  
Dec 02 04:56:20 At: 
Dec 02 04:56:20   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Dec 02 04:56:20   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Dec 02 04:56:20  
Dec 02 04:56:21 ok (2.453s) 
Dec 02 04:56:22   test_return_future_remote (__main__.ProcessGroupRpcTestWithSpawn) ... RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend. 
Dec 02 04:56:22 RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend. 
Dec 02 04:56:22 RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend. 
Dec 02 04:56:22 RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend. 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 22 times.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Nov 18, 2020
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 85a385f
Pull Request resolved: #48195
@ezyang ezyang mentioned this pull request Nov 18, 2020
Copy link
Copy Markdown

@bhosmer bhosmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick drive-by based on the PR description, front-running a full review later: the choices you're describing in your last bullet point kind of point to the brittleness of modeling the sequence in a maximally partitioned way like this. An alternative (I've pitched this before) is to model self and tensor options as optional pairs of (thing, insertion index) - it's true that the indexes wouldn't be correct by construction, but it avoids the pre-post splintering, and the arbitrary choices (and dynamic checks) for degenerate cases. Still worth modeling args and kwargs as separate lists ofc.

@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Nov 18, 2020

@bhosmer Engh, I still don't like it. I gave one reason (tricky bookkeeping if you ever need multiple indices--though tbf we don't have this right now) in our original conversation. But another reason why I don't really like splicing arguments into the lists is that the list prior to splicing isn't really usable for... anything, really.

Imagine:

class Arguments:
    kwarg_only: Tuple[Argument, ...]
    tensor_options: Optional[Tuple[TensorOptionsArguments, int]]

Does kwarg_only contain tensor options arguments or not? You seem to be favoring "make illegal states unrepresentable", so let's say that it doesn't (because if it does, then we need an extra consistency check that the duplicate copy in kwarg_only and tensor_options agree). So then we have something like this, given the schema empty.names(int[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor

Arguments(
  kwarg_only=("Dimname[]? names", "MemoryFormat? memory_format=None")
  tensor_options=("ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None", 1)
)

Use of the kwarg_only field directly is useless. There is never a time I want "names" and "memory_format" to be adjacent in an arguments list. I have to do the splice first, or split this into the pre and post parts, before I can do something useful with it. In the pre/post variant, I have pre-done the splitting, because that's what I'm ~always going to want to have happened.

If it's just getting rid of illegal states that we want, the pre/post version can be made to play ball like this:

class Arguments:
    pre_kwarg_only: Tuple[Argument, ...]
    tensor_options_and_post_kwarg_only: Optional[Tuple[TensorOptionsArguments, Tuple[Argument, ...]]]

but the extra typing here didn't seem worth it to me, at least.

@bhosmer
Copy link
Copy Markdown

bhosmer commented Nov 18, 2020

@ezyang thanks for writing out the example, it hits the nail on the head. You're exactly right, the taste/distaste for this approach comes down to whether you feel like the ground representation should be "useful" on its own, or if it's just raw material for derived views.

Will a full review introduce me to client code that uses the pre/post fragments directly (like in a semantically meaningful way)? If so then I'd agree that what I'm pitching is probably a net loss because it'd have to add derived views for both the full list and the fragments. I'd assumed the full list was the only view that mattered, which I think would put the two representations roughly at parity in terms of view complexity.

You might also be reacting to (per your example) kwargs_only being misreadable as a finished list, but I'd think naming and/or commenting would take care of that. But is that part of it?

if self.tensor_options is not None:
ret.extend(self.tensor_options.all())
ret.extend(self.post_tensor_options_kwarg_only)
return ret
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and positional above are a use of the fields.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, this is what I meant by derived views. By parity I meant that the above and something like below (mod typos) are basically the same complexity, and you wouldn't have to tiebreak the degenerate representations.

non_self_positional: Tuple[Argument, ...]
self_arg: Optional[Tuple[SelfArgument, int]]

non_tensor_options_kwarg_only: Tuple[Argument, ...]
tensor_options: Optional[Tuple[TensorOptionsArguments, int]]

def positional(self) -> Sequence[Argument]:
  ret: List[Argument] = []
  ret.extend(self.non_self_positional)
  if self.self_arg is not None:
    arg, i = self.self_arg
    ret.insert(i, arg)
  return ret

def kwarg_only(self) -> Sequence[Argument]:
  ret: List[Argument] = []
  ret.extend(self.non_tensor_options_kwarg_only)
  if self.tensor_options is not None:
    args, i = self.tensor_options
    ret[i:i] = args.all()
  return ret

But TBH it's not worth litigating, the way it is currently is fine by me if you just like it better. My main motivation here is to sketch the alternative out concretely, not to push super hard for its adoption.

args.extend(func.arguments.pre_tensor_options_kwarg_only)
if func.arguments.tensor_options is not None:
args.append(func.arguments.tensor_options)
args.extend(func.arguments.post_tensor_options_kwarg_only)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another use of the fields, although the intention is to kill this function and make use of the fields directly

@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Nov 18, 2020

Will a full review introduce me to client code that uses the pre/post fragments directly (like in a semantically meaningful way)?

I went ahead and marked the relevant spots. But I didn't really try to refactor anything else after this, so most of the usage is just "squashing" the individual parts back into the list of union which is what the rest of the code uses right now.

Projecting ahead to further refactors, the other things most likely to happen:

  1. For methods, you need to drop the self parameter when it doesn't show up in the C++ signature (so actually your rep is good for this)
  2. When translating to cpp/dispatcher, depending on use_c10_full you will either expand or keep packed the tensor options argument, but it will always be in the same position compared to the other arguments. So splicing will be necessary here
  3. Whatever happens to Faithful out arguments #47712; but most of the heat here is on out arguments so this is essentially the same as (2) (where you are going to do the translation to cpp/dispatcher)

You might also be reacting to (per your example) kwargs_only being misreadable as a finished list, but I'd think naming and/or commenting would take care of that.

You got me, I was intentionally being a little hyperbolic in the example naming ;)

The general approach is to change Arguments, splitting `positional`, `kwarg_only` and `out`, into `pre_self_positional`, `self_arg`, `post_self_positional`, and `pre_tensor_options_kwarg_only`, `tensor_options` and `post_tensor_options_kwarg_only`. The splits are as you'd expect: we extract out the self argument and the tensor options arguments, and record the other arguments that came before and after. To do this, we move the logic in `group_arguments` to the parsing process.

Some fuzz in the process:
* I renamed `ThisArgument` to `SelfArgument`, since we don't actually use the terminology "this" outside of C++ (and the model is Python biased)
* I kept the `group_arguments` function, which now just reads out the arguments from the structured model in the correct order. In the long term, we should get rid of this function entirely, but for now I kept it as is to reduce churn.
* I decided to arbitrarily say that when self is missing, everything goes in "post-self", but when tensor options is missing, everything goes in "pre-tensor-options". This was based on where you typically find the argument in question: self is usually at front (so most args are after it), while tensor options are typically at the end (so most args go before it).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
Copy link
Copy Markdown

@bhosmer bhosmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

# If there is enough space...
if i <= len(kwarg_only) - len(predicates):
# And the next len(predicates) arguments look like TensorOptions arguments
if all(p(a) for p, a in zip(predicates, kwarg_only[i : i + len(predicates)])):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still feels like a small devex pothole to be failing silently on near misses here, but probably out of scope for this work.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is preexisting code, so out of scope for this PR. Note that we do have plenty of "near-ish misses" (e.g., one or two arguments from TensorOptions) in various functions, including

- func: to.device(Tensor self, Device device, ScalarType dtype, bool non_blocking=False, bool copy=False, MemoryFormat? memory_format=None) -> Tensor
  use_c10_dispatcher: full
  variants: method
  device_guard: False

so it's not altogether clear to me what a "you mistyped this" error message would look like.

Copy link
Copy Markdown

@bhosmer bhosmer Nov 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the line isn't trivially easy to draw. I guess if the goal state is "nobody uses TensorOptions except for factory functions in the user C++ API", then the most robust setup might be to drive the gather logic from an explicit annotation in native_functions rather than a pattern match - could even just use category_override: factory.

The general approach is to change Arguments, splitting `positional`, `kwarg_only` and `out`, into `pre_self_positional`, `self_arg`, `post_self_positional`, and `pre_tensor_options_kwarg_only`, `tensor_options` and `post_tensor_options_kwarg_only`. The splits are as you'd expect: we extract out the self argument and the tensor options arguments, and record the other arguments that came before and after. To do this, we move the logic in `group_arguments` to the parsing process.

Some fuzz in the process:
* I renamed `ThisArgument` to `SelfArgument`, since we don't actually use the terminology "this" outside of C++ (and the model is Python biased)
* I kept the `group_arguments` function, which now just reads out the arguments from the structured model in the correct order. In the long term, we should get rid of this function entirely, but for now I kept it as is to reduce churn.
* I decided to arbitrarily say that when self is missing, everything goes in "post-self", but when tensor options is missing, everything goes in "pre-tensor-options". This was based on where you typically find the argument in question: self is usually at front (so most args are after it), while tensor options are typically at the end (so most args go before it).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit to ezyang/pytorch that referenced this pull request Dec 1, 2020
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 249fbfa
Pull Request resolved: pytorch#48195
ezyang added a commit to ezyang/pytorch that referenced this pull request Dec 2, 2020
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 249fbfa
Pull Request resolved: pytorch#48195
The general approach is to change Arguments, splitting `positional`, `kwarg_only` and `out`, into `pre_self_positional`, `self_arg`, `post_self_positional`, and `pre_tensor_options_kwarg_only`, `tensor_options` and `post_tensor_options_kwarg_only`. The splits are as you'd expect: we extract out the self argument and the tensor options arguments, and record the other arguments that came before and after. To do this, we move the logic in `group_arguments` to the parsing process.

Some fuzz in the process:
* I renamed `ThisArgument` to `SelfArgument`, since we don't actually use the terminology "this" outside of C++ (and the model is Python biased)
* I kept the `group_arguments` function, which now just reads out the arguments from the structured model in the correct order. In the long term, we should get rid of this function entirely, but for now I kept it as is to reduce churn.
* I decided to arbitrarily say that when self is missing, everything goes in "post-self", but when tensor options is missing, everything goes in "pre-tensor-options". This was based on where you typically find the argument in question: self is usually at front (so most args are after it), while tensor options are typically at the end (so most args go before it).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D25231166](https://our.internmc.facebook.com/intern/diff/D25231166)

[ghstack-poisoned]
ezyang added a commit to ezyang/pytorch that referenced this pull request Dec 2, 2020
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: cd0aa6e
Pull Request resolved: pytorch#48195
@facebook-github-bot
Copy link
Copy Markdown
Contributor

@ezyang merged this pull request in 742903c.

shaibagon pushed a commit to shaibagon/pytorch that referenced this pull request Dec 3, 2020
Summary:
Pull Request resolved: pytorch#48195

The general approach is to change Arguments, splitting `positional`, `kwarg_only` and `out`, into `pre_self_positional`, `self_arg`, `post_self_positional`, and `pre_tensor_options_kwarg_only`, `tensor_options` and `post_tensor_options_kwarg_only`. The splits are as you'd expect: we extract out the self argument and the tensor options arguments, and record the other arguments that came before and after. To do this, we move the logic in `group_arguments` to the parsing process.

Some fuzz in the process:
* I renamed `ThisArgument` to `SelfArgument`, since we don't actually use the terminology "this" outside of C++ (and the model is Python biased)
* I kept the `group_arguments` function, which now just reads out the arguments from the structured model in the correct order. In the long term, we should get rid of this function entirely, but for now I kept it as is to reduce churn.
* I decided to arbitrarily say that when self is missing, everything goes in "post-self", but when tensor options is missing, everything goes in "pre-tensor-options". This was based on where you typically find the argument in question: self is usually at front (so most args are after it), while tensor options are typically at the end (so most args go before it).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: zhangguanheng66

Differential Revision: D25231166

Pulled By: ezyang

fbshipit-source-id: 25d77ad8319c4ce0bba4ad82e451bf536ef823ad
@facebook-github-bot facebook-github-bot deleted the gh/ezyang/869/head branch December 6, 2020 15:17
ezyang added a commit that referenced this pull request Dec 8, 2020
Previously, this function had nontrivial algorithmic content,
but after #48195, this was just a swiss army knife for pasting
together arguments while maintaining structure.  I added some
more properties for Arguments for convenient access in this way,
and then inlined the implementation of group_arguments into all of its call
sites, simplifying whenever contextual.  This might be controversial, but I
think the resulting code is easier to understand.

You may notice that there is some modest code duplication between
dispatcher.cpparguments_exprs and CppSignature.argument_packs.
This is a known problem and I will be attempting to fix it in
a follow up PR.

Confirmed to be byte-for-byte compatible.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Dec 8, 2020
Previously, this function had nontrivial algorithmic content,
but after #48195, this was just a swiss army knife for pasting
together arguments while maintaining structure.  I added some
more properties for Arguments for convenient access in this way,
and then inlined the implementation of group_arguments into all of its call
sites, simplifying whenever contextual.  This might be controversial, but I
think the resulting code is easier to understand.

You may notice that there is some modest code duplication between
dispatcher.cpparguments_exprs and CppSignature.argument_packs.
This is a known problem and I will be attempting to fix it in
a follow up PR.

Confirmed to be byte-for-byte compatible.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 7ff71ff
Pull Request resolved: #49043
ezyang added a commit that referenced this pull request Dec 9, 2020
Previously, this function had nontrivial algorithmic content,
but after #48195, this was just a swiss army knife for pasting
together arguments while maintaining structure.  I added some
more properties for Arguments for convenient access in this way,
and then inlined the implementation of group_arguments into all of its call
sites, simplifying whenever contextual.  This might be controversial, but I
think the resulting code is easier to understand.

You may notice that there is some modest code duplication between
dispatcher.cpparguments_exprs and CppSignature.argument_packs.
This is a known problem and I will be attempting to fix it in
a follow up PR.

Confirmed to be byte-for-byte compatible.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Dec 10, 2020
Previously, this function had nontrivial algorithmic content,
but after #48195, this was just a swiss army knife for pasting
together arguments while maintaining structure.  I added some
more properties for Arguments for convenient access in this way,
and then inlined the implementation of group_arguments into all of its call
sites, simplifying whenever contextual.  This might be controversial, but I
think the resulting code is easier to understand.

You may notice that there is some modest code duplication between
dispatcher.cpparguments_exprs and CppSignature.argument_packs.
This is a known problem and I will be attempting to fix it in
a follow up PR.

Confirmed to be byte-for-byte compatible.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
facebook-github-bot pushed a commit that referenced this pull request Dec 11, 2020
Summary:
Pull Request resolved: #49043

Previously, this function had nontrivial algorithmic content,
but after #48195, this was just a swiss army knife for pasting
together arguments while maintaining structure.  I added some
more properties for Arguments for convenient access in this way,
and then inlined the implementation of group_arguments into all of its call
sites, simplifying whenever contextual.  This might be controversial, but I
think the resulting code is easier to understand.

You may notice that there is some modest code duplication between
dispatcher.cpparguments_exprs and CppSignature.argument_packs.
This is a known problem and I will be attempting to fix it in
a follow up PR.

Confirmed to be byte-for-byte compatible.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D25455885

Pulled By: ezyang

fbshipit-source-id: 8fbe066e8c3cb7ee8adb5b87296ec5bd7b49e01f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants