Add Autocast Support for FakeTensors / use fake device dispatch keys by eellison · Pull Request #82449 · pytorch/pytorch

eellison · 2022-07-29T01:25:47Z

Stack from ghstack (oldest at bottom):

From PR:

Note: [Fake Tensor Dispatch Keys]
In order to model the behavior of device-specific autocast 
and autograd logic, we update the dispatch keys of FakeTensors
to reflect their fake device. This includes the BackendComponent
(DispatchKey::Meta -> DispatchKey::CUDA), and also the BackendComponent
related Autocast and Autograd keys. __torch__dispatch__ sits below
Autocast and Autograd, and is only invoked when we are at the
kernel for the BackendComponent. Then, we add Meta to the 
thread-local dispatch include set to hit the meta kernel 
instead of the kernel of the BackendComponent for the fake device.

Also adds the conv1/2/3d.padding operators to the Autocast rule set. Without that fix, the FakeTensor dtype would diverge.

See: #81608

[ghstack-poisoned]

facebook-github-bot · 2022-07-29T01:25:53Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/82449
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (0 Pending)

As of commit f81d77c (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

…patch keys" [ghstack-poisoned]

…patch keys" From PR: ``` Note: [Fake Tensor Dispatch Keys] In order to model the behavior of device-specific autocast and autograd logic, we update the dispatch keys of FakeTensors to reflect their fake device. This includes the BackendComponent (DispatchKey::Meta -> DispatchKey::CUDA), and also the BackendComponent related Autocast and Autograd keys. __torch__dispatch__ sits below Autocast and Autograd, and is only invoked when we are at the kernel for the BackendComponent. Then, we add Meta to the thread-local dispatch include set to hit the meta kernel instead of the kernel of the BackendComponent for the fake device. ``` Also adds the `conv1/2/3d.padding` operators to the Autocast rule set. Without that fix, the FakeTensor dtype would diverge. [ghstack-poisoned]

ghstack-source-id: ab494a3 Pull Request resolved: #82449

c10/core/TensorImpl.cpp

ezyang · 2022-07-29T17:38:14Z

torch/csrc/Module.cpp

+    auto k = key_set.highestBackendKey();
+    local_keyset.included_ = local_keyset.included_.remove_backend(k);
+    c10::impl::_force_tls_local_dispatch_key_set(local_keyset);
+  });


feels like it would be better to bind one of the local TLS RAII objects. It's pretty quick and easy

I'm trying to fix the functionalization <> dynamic shape issues and I ended up adding something pretty similar here: _change_backend_component_keys.. If this looks good to you, then we can either stamp those changes into this PR or fix this code up after my PR lands.

If you don't mind fixing up this code after my PR lands, that would be great. Thanks brian.

torch/_subclasses/fake_tensor.py

ezyang · 2022-07-29T17:47:34Z

torch/_subclasses/fake_tensor.py

        # the call here
+        # because it doesn't go through the dispatcher, we run into errors
+        # when attempting to compute an output in meta, so
+        # we compute the real tensor then convert to meta


But this means that new is no longer memory efficient, is that right?

Yea, this is sort of a deprecated API / doesn't go through the dispatcher, so it would make sense that it might be fragile to changes. Similar to fallbacks, seemed okay to allocate a real tensor temporarily.

OK. It's just odd that although we have explicit handling for it, we aren't able to just go ahead and write the "correct" fake rule for it. But I suppose it might be annoying to do.

c10/core/TensorImpl.cpp

ezyang · 2022-07-29T17:48:45Z

aten/src/ATen/autocast_mode.cpp

  KERNEL_CPU(ADD_NS(conv2d), "conv2d", Tensor (const Tensor &, const Tensor &, const c10::optional<Tensor>&, IntArrayRef, IntArrayRef, IntArrayRef, int64_t), lower_precision_fp)
+  KERNEL_CPU(ADD_NS(conv2d), "conv2d.padding", Tensor (const Tensor&, const Tensor&, const c10::optional<Tensor>&, IntArrayRef, c10::string_view, IntArrayRef, int64_t groups), lower_precision_fp)
  KERNEL_CPU(ADD_NS(conv3d), "conv3d", Tensor (const Tensor &, const Tensor &, const c10::optional<Tensor>&, IntArrayRef, IntArrayRef, IntArrayRef, int64_t), lower_precision_fp)
+  KERNEL_CPU(ADD_NS(conv3d), "conv3d.padding", Tensor (const Tensor&, const Tensor&, const c10::optional<Tensor>&, IntArrayRef, c10::string_view, IntArrayRef, int64_t groups), lower_precision_fp)


Do you have a more detailed explanation about these? Is a bit confusing.

aten::conv2d.padding redispatches to aten::convolution, which doesn't have an autocast registered for it, _convolution does. Its only in the CPU kernel by invoking _convolution that autocast gets applied. Since the meta kernel for aten::convolution wont call _convolution without the changes no dispatching gets applied.

So I mirrored the existing pattern for conv1d and added it to the registration list.

oof. I guess if we turned off autocast after we got past the autocast layer this would prevent that, but we don't really want to pay for it. :/ There may be other latent bugs like this.

At the risk of micro-optimizing: One thing we could maybe do if we’re worried about the boxing/unboxing cost of the fallback (although we’d still have to pay for the redispatch + TLS): since we don’t actually care about the operator’s arguments, we could teach the dispatcher to not box/unbox anything if the boxed fallback that you register doesn’t take in an argument stack (indicating that it doesn’t use the arguments)

c10/core/TensorImpl.cpp

…patch keys" From PR: ``` Note: [Fake Tensor Dispatch Keys] In order to model the behavior of device-specific autocast and autograd logic, we update the dispatch keys of FakeTensors to reflect their fake device. This includes the BackendComponent (DispatchKey::Meta -> DispatchKey::CUDA), and also the BackendComponent related Autocast and Autograd keys. __torch__dispatch__ sits below Autocast and Autograd, and is only invoked when we are at the kernel for the BackendComponent. Then, we add Meta to the thread-local dispatch include set to hit the meta kernel instead of the kernel of the BackendComponent for the fake device. ``` Also adds the `conv1/2/3d.padding` operators to the Autocast rule set. Without that fix, the FakeTensor dtype would diverge. See: #81608 [ghstack-poisoned]

ghstack-source-id: 0b282cf Pull Request resolved: #82449

…patch keys" From PR: ``` Note: [Fake Tensor Dispatch Keys] In order to model the behavior of device-specific autocast and autograd logic, we update the dispatch keys of FakeTensors to reflect their fake device. This includes the BackendComponent (DispatchKey::Meta -> DispatchKey::CUDA), and also the BackendComponent related Autocast and Autograd keys. __torch__dispatch__ sits below Autocast and Autograd, and is only invoked when we are at the kernel for the BackendComponent. Then, we add Meta to the thread-local dispatch include set to hit the meta kernel instead of the kernel of the BackendComponent for the fake device. ``` Also adds the `conv1/2/3d.padding` operators to the Autocast rule set. Without that fix, the FakeTensor dtype would diverge. See: #81608 [ghstack-poisoned]

eellison · 2022-08-01T21:29:17Z

@pytorchbot merge -g

pytorchmergebot · 2022-08-01T21:30:30Z

@pytorchbot successfully started a merge job. Check the current status here

github-actions · 2022-08-01T21:45:18Z

Hey @eellison.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

…82449) (#82449) Summary: From PR: ``` Note: [Fake Tensor Dispatch Keys] In order to model the behavior of device-specific autocast and autograd logic, we update the dispatch keys of FakeTensors to reflect their fake device. This includes the BackendComponent (DispatchKey::Meta -> DispatchKey::CUDA), and also the BackendComponent related Autocast and Autograd keys. __torch__dispatch__ sits below Autocast and Autograd, and is only invoked when we are at the kernel for the BackendComponent. Then, we add Meta to the thread-local dispatch include set to hit the meta kernel instead of the kernel of the BackendComponent for the fake device. ``` Also adds the `conv1/2/3d.padding` operators to the Autocast rule set. Without that fix, the FakeTensor dtype would diverge. See: #81608 Pull Request resolved: #82449 Approved by: https://github.com/ezyang Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/642aed8b99bb35ab7029e7e839350a6d06464882 Reviewed By: kit1980 Differential Revision: D38330007 Pulled By: eellison fbshipit-source-id: db1865a5a59c81ca66533de091f7ae1865efacd7

Add Autocast Support for FakeTensors / use fake device dispatch keys

8e6d0a0

[ghstack-poisoned]

eellison requested review from mruberry and ngimel as code owners July 29, 2022 01:25

facebook-github-bot added the cla signed label Jul 29, 2022

Update on "Add Autocast Support for FakeTensors / use fake device dis…

2e3f877

…patch keys" [ghstack-poisoned]

eellison added a commit that referenced this pull request Jul 29, 2022

Add Autocast Support for FakeTensors / use fake device dispatch keys

291f966

ghstack-source-id: ab494a3 Pull Request resolved: #82449

eellison requested review from Chillee, bdhirsh and ezyang July 29, 2022 01:33

ezyang reviewed Jul 29, 2022

View reviewed changes

c10/core/TensorImpl.cpp Show resolved Hide resolved

ezyang reviewed Jul 29, 2022

View reviewed changes

torch/_subclasses/fake_tensor.py Outdated Show resolved Hide resolved

ezyang reviewed Jul 29, 2022

View reviewed changes

bdhirsh reviewed Jul 29, 2022

View reviewed changes

c10/core/TensorImpl.cpp Outdated Show resolved Hide resolved

ezyang reviewed Jul 29, 2022

View reviewed changes

ezyang approved these changes Jul 29, 2022

View reviewed changes

bdhirsh reviewed Jul 29, 2022

View reviewed changes

c10/core/TensorImpl.cpp Show resolved Hide resolved

eellison requested review from albanD and soulitzer as code owners July 29, 2022 22:07

eellison added a commit that referenced this pull request Jul 29, 2022

Add Autocast Support for FakeTensors / use fake device dispatch keys

70793aa

ghstack-source-id: 0b282cf Pull Request resolved: #82449

eellison mentioned this pull request Jul 29, 2022

Add tests for aliasing in fake tensor #82337

Closed

eellison mentioned this pull request Aug 1, 2022

Add tests for fake tensor striding #82571

Closed

eellison added 2 commits August 1, 2022 15:31

eellison mentioned this pull request Aug 1, 2022

Sparse fake tensor support #82172

Closed

eellison mentioned this pull request Aug 1, 2022

Fake: copy over grad attribute #82593

Closed

pytorchmergebot added the Merged label Aug 1, 2022

pytorchmergebot closed this in 642aed8 Aug 1, 2022

facebook-github-bot deleted the gh/eellison/301/head branch August 5, 2022 14:19

charlie-wt mentioned this pull request Aug 23, 2023

View ops on fake tensors can dispatch detaches to backend kernels #104194

Open

Conversation

eellison commented Jul 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jul 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

✅ No Failures (0 Pending)

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eellison commented Aug 1, 2022

Uh oh!

pytorchmergebot commented Aug 1, 2022

Uh oh!

github-actions bot commented Aug 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

eellison commented Jul 29, 2022 •

edited

Loading

facebook-github-bot commented Jul 29, 2022 •

edited

Loading