Refactor TensorIterator to do allocations via MetaBase::set_output by ezyang · Pull Request #48659 · pytorch/pytorch

ezyang · 2020-12-01T18:09:32Z

Stack from ghstack:

Delete NativeFunctions.h include from Functions.h #48687 Delete NativeFunctions.h include from Functions.h
Move var and std overloads to Functions.cpp and remove native:: reference #48683 Move var and std overloads to Functions.cpp and remove native:: reference
Refactor TensorIterator to do allocations via MetaBase::set_output #48659 Refactor TensorIterator to do allocations via MetaBase::set_output
Move argument grouping into FunctionSchema #48195 Move argument grouping into FunctionSchema
Refactor argument fields in FunctionSchema to Arguments #48182 Refactor argument fields in FunctionSchema to Arguments
Structured kernels generate Meta registrations #48116 Structured kernels generate Meta registrations

Detailed RFC at
https://github.com/pytorch/rfcs/blob/rfc-0005/RFC-0005-structured-kernel-definitions.md#handling-tensoriterator

What this diff does:

Refactor allocation of outputs in TensorIterator into a call to a single function TensorIterator::set_output. This nicely centralizes restriding logic and mostly eliminates the need for a separate named tensor propagation pass. The one exception is for inplace operations (add_), where previously we never actually call set_output when we determine resizing is not necessary; there's an extra propagate names in allocate_or_resize_outputs to handle this case (I audited all other set_output sites and found that we always hit this path in that situation). Although hypothetically this could cause problems for structured kernels (which require a set_output call in all cases), this codepath is irrelevant for structured kernels as a TensorIterator will never be constructed with an explicit out argument (remember, structured kernels handle out/functional/inplace variants). There's also a tricky case in compute_types; check the comments there for more details.
Split TensorIterator into a TensorIteratorBase, which contains most of the logic but doesn't define set_output. A decent chunk of the diff is just the mechanical rename of TensorIterator to TensorIteratorBase. However, there are a few cases where we create fresh TensorIterator objects from another TensorIterator. In those cases, we always construct a fresh TensorIterator (rather than preserving the subclass of TensorIteratorBase that induced this construction). This makes sense, because a structured function class will contain metadata that isn't relevant for these downstream uses. This is done by intentionally permitting object slicing with the TensorIterator(const TensorIteratorBase&) constructor.
Introduce a new MetaBase class which contains the canonical virtual method definition for set_output. This will allow structured classes to make use of it directly without going through TensorIterator (not in this PR).

Signed-off-by: Edward Z. Yang ezyang@fb.com

Differential Revision: D25261844

Detailed RFC at https://github.com/pytorch/rfcs/blob/rfc-0005/RFC-0005-structured-kernel-definitions.md#handling-tensoriterator What this diff does: * Refactor allocation of outputs in TensorIterator into a call to a single function set_output. This nicely centralizes restriding logic and eliminates the need for a separate named tensor propagation pass. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

Detailed RFC at https://github.com/pytorch/rfcs/blob/rfc-0005/RFC-0005-structured-kernel-definitions.md#handling-tensoriterator What this diff does: * Refactor allocation of outputs in TensorIterator into a call to a single function set_output. This nicely centralizes restriding logic and eliminates the need for a separate named tensor propagation pass. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 65f9fb4 Pull Request resolved: #48659

dr-ci · 2020-12-01T18:19:48Z

💊 CI failures summary and remediations

As of commit 6969196 (more details on the Dr. CI page):

✅ None of the CI failures appear to be your fault 💚

2/2 broken upstream at merge base f798696 on Dec 01 from 1:08pm to 6:50pm PDT (18 commits; c5f1117 - 25e367e)

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 14 times.

…t_output" Detailed RFC at https://github.com/pytorch/rfcs/blob/rfc-0005/RFC-0005-structured-kernel-definitions.md#handling-tensoriterator What this diff does: * Refactor allocation of outputs in TensorIterator into a call to a single function set_output. This nicely centralizes restriding logic and eliminates the need for a separate named tensor propagation pass. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

Detailed RFC at https://github.com/pytorch/rfcs/blob/rfc-0005/RFC-0005-structured-kernel-definitions.md#handling-tensoriterator What this diff does: * Refactor allocation of outputs in TensorIterator into a call to a single function set_output. This nicely centralizes restriding logic and eliminates the need for a separate named tensor propagation pass. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: f7f16f6 Pull Request resolved: #48659

Detailed RFC at https://github.com/pytorch/rfcs/blob/rfc-0005/RFC-0005-structured-kernel-definitions.md#handling-tensoriterator What this diff does: * Refactor allocation of outputs in TensorIterator into a call to a single function set_output. This nicely centralizes restriding logic and eliminates the need for a separate named tensor propagation pass. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: f7f16f6 Pull Request resolved: pytorch#48659

…t_output" Detailed RFC at https://github.com/pytorch/rfcs/blob/rfc-0005/RFC-0005-structured-kernel-definitions.md#handling-tensoriterator What this diff does: * Refactor allocation of outputs in TensorIterator into a call to a single function TensorIterator::set_output. This nicely centralizes restriding logic and mostly eliminates the need for a separate named tensor propagation pass. The one exception is for inplace operations (`add_`), where previously we never actually call `set_output` when we determine resizing is not necessary; there's an extra propagate names in `allocate_or_resize_outputs` to handle this case (I audited all other `set_output` sites and found that we always hit this path in that situation). Although hypothetically this could cause problems for structured kernels (which require a `set_output` call in all cases), this codepath is irrelevant for structured kernels as a TensorIterator will never be constructed with an explicit out argument (remember, structured kernels handle out/functional/inplace variants). There's also a tricky case in `compute_types`; check the comments there for more details. * Split TensorIterator into a TensorIteratorBase, which contains most of the logic but doesn't define `set_output`. A decent chunk of the diff is just the mechanical rename of TensorIterator to TensorIteratorBase. However, there are a few cases where we create fresh TensorIterator objects from another TensorIterator. In those cases, we always construct a fresh TensorIterator (rather than preserving the subclass of TensorIteratorBase that induced this construction). This makes sense, because a structured function class will contain metadata that isn't relevant for these downstream uses. This is done by *intentionally* permitting object slicing with the `TensorIterator(const TensorIteratorBase&)` constructor. * Introduce a new `MetaBase` class which contains the canonical virtual method definition for `set_output`. This will allow structured classes to make use of it directly without going through TensorIterator (not in this PR). Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

Detailed RFC at https://github.com/pytorch/rfcs/blob/rfc-0005/RFC-0005-structured-kernel-definitions.md#handling-tensoriterator What this diff does: * Refactor allocation of outputs in TensorIterator into a call to a single function set_output. This nicely centralizes restriding logic and eliminates the need for a separate named tensor propagation pass. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 999f9c0 Pull Request resolved: pytorch#48659

…t_output" Detailed RFC at https://github.com/pytorch/rfcs/blob/rfc-0005/RFC-0005-structured-kernel-definitions.md#handling-tensoriterator What this diff does: * Refactor allocation of outputs in TensorIterator into a call to a single function TensorIterator::set_output. This nicely centralizes restriding logic and mostly eliminates the need for a separate named tensor propagation pass. The one exception is for inplace operations (`add_`), where previously we never actually call `set_output` when we determine resizing is not necessary; there's an extra propagate names in `allocate_or_resize_outputs` to handle this case (I audited all other `set_output` sites and found that we always hit this path in that situation). Although hypothetically this could cause problems for structured kernels (which require a `set_output` call in all cases), this codepath is irrelevant for structured kernels as a TensorIterator will never be constructed with an explicit out argument (remember, structured kernels handle out/functional/inplace variants). There's also a tricky case in `compute_types`; check the comments there for more details. * Split TensorIterator into a TensorIteratorBase, which contains most of the logic but doesn't define `set_output`. A decent chunk of the diff is just the mechanical rename of TensorIterator to TensorIteratorBase. However, there are a few cases where we create fresh TensorIterator objects from another TensorIterator. In those cases, we always construct a fresh TensorIterator (rather than preserving the subclass of TensorIteratorBase that induced this construction). This makes sense, because a structured function class will contain metadata that isn't relevant for these downstream uses. This is done by *intentionally* permitting object slicing with the `TensorIterator(const TensorIteratorBase&)` constructor. * Introduce a new `MetaBase` class which contains the canonical virtual method definition for `set_output`. This will allow structured classes to make use of it directly without going through TensorIterator (not in this PR). Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

ngimel

lgtm

bhosmer

Looks good!

Only real question is about some minorly redundant logic that I think fell out of the refactoring straddle between allocate_or_resize_outputs and fast_set_up call sites. But I'd buy that this is the right sweet spot, at least for now.

bhosmer · 2020-12-02T02:18:35Z

aten/src/ATen/native/TensorIterator.cpp

    if (common_device == kCPU) {
      // Casts to outputs by creating temporaries of the correct dtype (if needed)
      if (config.cast_common_dtype_to_outputs_ && op.is_output && op.current_dtype != common_dtype_) {
+        TORCH_INTERNAL_ASSERT(op.tensor.defined());


I'm seeing the continue on line 372... not seeing how we get here with op.tensor undefined, but maybe I'm missing it

You can't get here with op.tensor() being undefined, that's why it's an assert :) Actually I missed the continue on line 372, but I was fairly confident (for other reasons) that it was impossible, but not 100% so.

bhosmer · 2020-12-02T02:44:50Z

aten/src/ATen/native/TensorIterator.cpp

+  auto& op = operands_[output_idx];
+  TORCH_INTERNAL_ASSERT_DEBUG_ONLY(output_idx < num_outputs_);
+  if (!op.tensor.defined()) {
+      TORCH_INTERNAL_ASSERT(op.is_type_defined(), "no type for operand", output_idx);


On a quick read it looks like a less aggressive factoring would avoid some redundant tests when this is called from allocate_or_resize_outputs, at the cost of leaving more code in fast_set_up - but I'm assuming you figure the slow path can afford it (or that there's some API requirement that it be done in set_output that I'm missing)

This is a good question. When I wrote this to start, I tried to assume as few preconditions on the call-site as possible, ergo the redundant tests. If I understand you correctly, it's just this assert that could be removed (I don't think any of the other asserts are removable). Seems reasonable.

bhosmer · 2020-12-02T02:49:49Z

aten/src/ATen/native/TensorIterator.h

  // of a lot of boilerplate above
  TensorIterator build() {
-    return TensorIterator(*this);
+    TensorIterator iter;


did you kill the TensorIterator(TensorIteratorConfig&) constructor just to clean things up?

You are right, I don't think I was supposed to delete this constructor, and I ended up just refactoring around the problem. Not sure if it's worth restoring it though!

Yeah I think this is better - more transparent. The other just hid the build call

Detailed RFC at https://github.com/pytorch/rfcs/blob/rfc-0005/RFC-0005-structured-kernel-definitions.md#handling-tensoriterator What this diff does: * Refactor allocation of outputs in TensorIterator into a call to a single function set_output. This nicely centralizes restriding logic and eliminates the need for a separate named tensor propagation pass. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: d403964 Pull Request resolved: pytorch#48659

ezyang · 2020-12-02T15:58:46Z

@bhosmer I merged this because I got a clean merge window but I still owe you updates for your comments. Coming soon.

facebook-github-bot · 2020-12-02T17:12:26Z

@ezyang merged this pull request in 6ba7709.

Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

…ytorch#48659) Summary: Pull Request resolved: pytorch#48659 Detailed RFC at https://github.com/pytorch/rfcs/blob/rfc-0005/RFC-0005-structured-kernel-definitions.md#handling-tensoriterator What this diff does: * Refactor allocation of outputs in TensorIterator into a call to a single function TensorIterator::set_output. This nicely centralizes restriding logic and mostly eliminates the need for a separate named tensor propagation pass. The one exception is for inplace operations (`add_`), where previously we never actually call `set_output` when we determine resizing is not necessary; there's an extra propagate names in `allocate_or_resize_outputs` to handle this case (I audited all other `set_output` sites and found that we always hit this path in that situation). Although hypothetically this could cause problems for structured kernels (which require a `set_output` call in all cases), this codepath is irrelevant for structured kernels as a TensorIterator will never be constructed with an explicit out argument (remember, structured kernels handle out/functional/inplace variants). There's also a tricky case in `compute_types`; check the comments there for more details. * Split TensorIterator into a TensorIteratorBase, which contains most of the logic but doesn't define `set_output`. A decent chunk of the diff is just the mechanical rename of TensorIterator to TensorIteratorBase. However, there are a few cases where we create fresh TensorIterator objects from another TensorIterator. In those cases, we always construct a fresh TensorIterator (rather than preserving the subclass of TensorIteratorBase that induced this construction). This makes sense, because a structured function class will contain metadata that isn't relevant for these downstream uses. This is done by *intentionally* permitting object slicing with the `TensorIterator(const TensorIteratorBase&)` constructor. * Introduce a new `MetaBase` class which contains the canonical virtual method definition for `set_output`. This will allow structured classes to make use of it directly without going through TensorIterator (not in this PR). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D25261844 Pulled By: ezyang fbshipit-source-id: 34a9830cccbc07eaaf7c4f75114cd00953e3db7d

Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 1de9cce Pull Request resolved: pytorch#48731

Summary: Pull Request resolved: #48731 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D25278034 Pulled By: ezyang fbshipit-source-id: 73652311b48d8d80c06e9385b7ff18ef3a158ae8

This was referenced Dec 1, 2020

Structured kernels generate Meta registrations #48116

Closed

Refactor argument fields in FunctionSchema to Arguments #48182

Closed

Move argument grouping into FunctionSchema #48195

Closed

[POC] Class-based scaffolding for structured functions #48262

Merged

facebook-github-bot added the cla signed label Dec 1, 2020

ezyang requested review from bhosmer and ngimel December 1, 2020 20:54

This was referenced Dec 2, 2020

Move var and std overloads to Functions.cpp and remove native:: reference #48683

Closed

Delete NativeFunctions.h include from Functions.h #48687

Closed

ngimel approved these changes Dec 2, 2020

View reviewed changes

bhosmer approved these changes Dec 2, 2020

View reviewed changes

facebook-github-bot closed this in 6ba7709 Dec 2, 2020

facebook-github-bot added the Merged label Dec 2, 2020

ezyang mentioned this pull request Dec 2, 2020

Generalize some TensorIterator consumers to take TensorIteratorBase #48727

Closed

ezyang added a commit that referenced this pull request Dec 2, 2020

Fix code review from #48659 and #48116

789e601

Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

This was referenced Dec 2, 2020

Fix code review from #48659 and #48116 #48731

Closed

Header cleanup #48728

Closed

This was referenced Dec 2, 2020

Move TensorIterator from ATen/native to ATen #48729

Closed

Class-based structured kernels, with migration of add to framework #48718

Closed

ezyang added a commit that referenced this pull request Dec 2, 2020

Update on "Fix code review from #48659 and #48116"

1ebd284

Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

ezyang added a commit to ezyang/pytorch that referenced this pull request Dec 3, 2020

Fix code review from pytorch#48659 and pytorch#48116

3d330c4

Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 1de9cce Pull Request resolved: pytorch#48731

facebook-github-bot deleted the gh/ezyang/875/head branch December 6, 2020 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor TensorIterator to do allocations via MetaBase::set_output#48659

Refactor TensorIterator to do allocations via MetaBase::set_output#48659
ezyang wants to merge 4 commits intogh/ezyang/875/basefrom
gh/ezyang/875/head

ezyang commented Dec 1, 2020 •

edited

Loading

Uh oh!

dr-ci bot commented Dec 1, 2020 •

edited

Loading

Uh oh!

ngimel left a comment

Uh oh!

bhosmer left a comment

Uh oh!

bhosmer Dec 2, 2020

Uh oh!

ezyang Dec 2, 2020

Uh oh!

bhosmer Dec 2, 2020

Uh oh!

ezyang Dec 2, 2020

Uh oh!

bhosmer Dec 2, 2020

Uh oh!

ezyang Dec 2, 2020

Uh oh!

bhosmer Dec 2, 2020

Uh oh!

ezyang commented Dec 2, 2020

Uh oh!

facebook-github-bot commented Dec 2, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ezyang commented Dec 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-ci bot commented Dec 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

Uh oh!

ngimel left a comment

Choose a reason for hiding this comment

Uh oh!

bhosmer left a comment

Choose a reason for hiding this comment

Uh oh!

bhosmer Dec 2, 2020

Choose a reason for hiding this comment

Uh oh!

ezyang Dec 2, 2020

Choose a reason for hiding this comment

Uh oh!

bhosmer Dec 2, 2020

Choose a reason for hiding this comment

Uh oh!

ezyang Dec 2, 2020

Choose a reason for hiding this comment

Uh oh!

bhosmer Dec 2, 2020

Choose a reason for hiding this comment

Uh oh!

ezyang Dec 2, 2020

Choose a reason for hiding this comment

Uh oh!

bhosmer Dec 2, 2020

Choose a reason for hiding this comment

Uh oh!

ezyang commented Dec 2, 2020

Uh oh!

facebook-github-bot commented Dec 2, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ezyang commented Dec 1, 2020 •

edited

Loading

dr-ci bot commented Dec 1, 2020 •

edited

Loading