add slow path for is_contiguous by george-qi · Pull Request #77396 · pytorch/pytorch

george-qi · 2022-05-13T01:00:12Z

Stack from ghstack:

-> add slow path for is_contiguous #77396

Implementation is done by roughly following the template from Issue 65423

Adding an argument uses_python_dispatch (defaulted to False) to _make_subclass and _make_wrapper_subclass
If True, then I'll set_python_dispatch(true) and set sizes_strides_policy_ to SizesStridesPolicy::CustomStrides
In is_contiguous, if both of the above are true, then I'll call the new method on PyInterpreter

This should avoid slowing down the fast path of is_contiguous.

I don't have a ton of context around python_dispatch, so please let me know if this is the intended use. The other way I imagined doing this is adding another enum to SizesStridesPolicy, like PythonCustomStrides, and then instead of checking both is_python_dispatch() and sizes_strides_policy_ >= SizesStridesPolicy::CustomStrides, I would just check sizes_strides_policy_ == SizesStridesPolicy::PythonCustomStrides

Differential Revision: D36527481

[ghstack-poisoned]

facebook-github-bot · 2022-05-13T01:00:18Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/77396
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (50 Pending)

As of commit 3d103ba (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

ghstack-source-id: 3064fb5 Pull Request resolved: #77396

ezyang · 2022-05-13T01:40:48Z

Looking over the implementation it looks like my outline was incomplete (also the outline predates policy consolidation), so there are a few more things to hit.

After policy consolidation, CustomStrides also causes strides() to be virtualized. Strides will then hit strides_custom which will raise an error, as this is the default implementation. So if we do this, we also have to make strides overrideable by Python in the same way.
This leads to a second problem, which is that once you flip on CustomStrides, there is literally no way to get at the underlying strides field on the TensorImpl object. Ordinarily, as a TensorImpl subclass you would have access to the protected strides_default method, but this is not an option for Python dispatch. It would be good if super().strides() did the right thing but I'm not exactly sure how to go about doing this. Maybe this is not a big deal as the point is you're opting out of the built in data.
It would be better not to put the is_python_dispatch test directly inline in the is_contiguous method, as it is called from a lot of places and by putting it in the inline test you're forcing a bunch of extra cold code at all of the call sites, bloating icache. What is less certain is how exactly we should move it out of line. The minimal change is to place it inside the default virtual method implementation itself; this is cheap and easy to do but if a TensorImpl subclass overrides that virtual method it would be easy to forget to dispatch to Python if necessary and then this codepath would go dark (depending on the use case, though, this might not actually be a problem) If we need to guarantee this path gets hit for a subclass, no matter the TensorImpl, we could revive the non-virtual method which checks if python_dispatch before deciding what to do.

This also leaves me a question which is whether or not this is the best implementation strategy. An alternative implementation strategy would have been to add is_contiguous to native_functions.yaml, and then use manual_cpp_binding to maintain this method in TensorBase

  bool is_contiguous(at::MemoryFormat memory_format=at::MemoryFormat::Contiguous) const {
    return impl_->is_contiguous(memory_format);
  }

but have a test for "is Python subclass" here. But there are benefits to the approach in this PR: it is faster for the common case (since we only do one branch, not two) and it avoids having to deal with TorchScript related fallout from them having defined is_contiguous directly in TS primitives. So let's leave it as is.

c10/core/TensorImpl.h

torch/csrc/autograd/python_variable.cpp

Implementation is done by roughly following the template from [Issue 65423](#65423) - Adding an argument `uses_python_dispatch` (defaulted to False) to `_make_subclass` and `_make_wrapper_subclass` - If True, then I'll `set_python_dispatch(true)` and set `sizes_strides_policy_` to `SizesStridesPolicy::CustomStrides` - In `is_contiguous`, if both of the above are true, then I'll call the new method on PyInterpreter This should avoid slowing down the fast path of `is_contiguous`. I don't have a ton of context around `python_dispatch`, so please let me know if this is the intended use. The other way I imagined doing this is adding another enum to `SizesStridesPolicy`, like `PythonCustomStrides`, and then instead of checking both `is_python_dispatch()` and `sizes_strides_policy_ >= SizesStridesPolicy::CustomStrides`, I would just check `sizes_strides_policy_ == SizesStridesPolicy::PythonCustomStrides` [ghstack-poisoned]

ghstack-source-id: 205b88b Pull Request resolved: #77396

george-qi · 2022-05-13T23:36:39Z

Thanks for the review and discussion in the meeting, super helpful!

After policy consolidation, CustomStrides also causes strides() to be virtualized. Strides will then hit strides_custom which will raise an error, as this is the default implementation. So if we do this, we also have to make strides overrideable by Python in the same way.

As discussed in the meeting, I'm currently limiting this diff to is_contiguous and the next to contiguous and punting on the others for now. Let me know if you feel strongly about supporting any of the other operations

This leads to a second problem, which is that once you flip on CustomStrides, there is literally no way to get at the underlying strides field on the TensorImpl object. Ordinarily, as a TensorImpl subclass you would have access to the protected strides_default method, but this is not an option for Python dispatch. It would be good if super().strides() did the right thing but I'm not exactly sure how to go about doing this. Maybe this is not a big deal as the point is you're opting out of the built in data.

Good call, and as you point out, for us, not having the underlying strides field is indeed okay for us

It would be better not to put the is_python_dispatch test directly inline in the is_contiguous method, as it is called from a lot of places and by putting it in the inline test you're forcing a bunch of extra cold code at all of the call sites, bloating icache. What is less certain is how exactly we should move it out of line. The minimal change is to place it inside the default virtual method implementation itself; this is cheap and easy to do but if a TensorImpl subclass overrides that virtual method it would be easy to forget to dispatch to Python if necessary and then this codepath would go dark (depending on the use case, though, this might not actually be a problem) If we need to guarantee this path gets hit for a subclass, no matter the TensorImpl, we could revive the non-virtual method which checks if python_dispatch before deciding what to do.

Thanks for the explanation and the learnings :) After a quick discussion with Christian, I've changed it the minimal change -- that is, putting it in the virtual method and leaving this up to the devs of tensor subclasses to add the Python dispatch if they'd like.

Please let me know if there are any other outstanding things!

torch/csrc/autograd/python_variable.cpp

Implementation is done by roughly following the template from [Issue 65423](#65423) - Adding an argument `uses_python_dispatch` (defaulted to False) to `_make_subclass` and `_make_wrapper_subclass` - If True, then I'll `set_python_dispatch(true)` and set `sizes_strides_policy_` to `SizesStridesPolicy::CustomStrides` - In `is_contiguous`, if both of the above are true, then I'll call the new method on PyInterpreter This should avoid slowing down the fast path of `is_contiguous`. I don't have a ton of context around `python_dispatch`, so please let me know if this is the intended use. The other way I imagined doing this is adding another enum to `SizesStridesPolicy`, like `PythonCustomStrides`, and then instead of checking both `is_python_dispatch()` and `sizes_strides_policy_ >= SizesStridesPolicy::CustomStrides`, I would just check `sizes_strides_policy_ == SizesStridesPolicy::PythonCustomStrides` [ghstack-poisoned]

ghstack-source-id: f23c099 Pull Request resolved: #77396

Implementation is done by roughly following the template from [Issue 65423](#65423) - Adding an argument `uses_python_dispatch` (defaulted to False) to `_make_subclass` and `_make_wrapper_subclass` - If True, then I'll `set_python_dispatch(true)` and set `sizes_strides_policy_` to `SizesStridesPolicy::CustomStrides` - In `is_contiguous`, if both of the above are true, then I'll call the new method on PyInterpreter This should avoid slowing down the fast path of `is_contiguous`. I don't have a ton of context around `python_dispatch`, so please let me know if this is the intended use. The other way I imagined doing this is adding another enum to `SizesStridesPolicy`, like `PythonCustomStrides`, and then instead of checking both `is_python_dispatch()` and `sizes_strides_policy_ >= SizesStridesPolicy::CustomStrides`, I would just check `sizes_strides_policy_ == SizesStridesPolicy::PythonCustomStrides` [ghstack-poisoned]

ghstack-source-id: c302260 Pull Request resolved: #77396

This reverts commit f6beda8. Reverted #77396 on behalf of https://github.com/malfet

malfet · 2022-05-19T17:08:21Z

By the way, look like this should have been marked as BC breaking?
Also, this failure (at least on compilation level) could have been easily avoided by adding default value to new argument

albanD · 2022-05-19T18:50:23Z

By the way, look like this should have been marked as BC breaking?

I don't think so, torchdistx is using private APIs. So this is not BC breaking.

malfet · 2022-05-19T20:44:37Z

By the way, look like this should have been marked as BC breaking?

I don't think so, torchdistx is using private APIs. So this is not BC breaking.

Hmm, are all APIs in C10 consider to be private?

george-qi · 2022-05-19T20:56:30Z

@george-qi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

albanD · 2022-05-19T21:02:35Z

Hmm, are all APIs in C10 consider to be private?

That's a good question. At least our FC/BC guarantees don't apply to them. Just like we're "breaking" aten APIs all the time without warnings.

Summary: Pull Request resolved: #77906 This is a re-landing of #77396 with a patch to torchdistx so that internal CI's don't break Test Plan: CI Reviewed By: malfet, b0noI Differential Revision: D36493890 fbshipit-source-id: 8373f0bfa3803d2c8bc6dd5178b13205c39ea1f0

Similar to #77396, add a way to override `device` on Tensor Subclasses by adding a bit to call into a custom, virtual `device` function. This is necessary to allow extending devices with a `meta` device to also carry along a device so as to compute device propagation in addition to other properties (FakeTensors). [ghstack-poisoned]

Similar to #77396, add a way to override `device` on Tensor Subclasses by adding a bit to call into a custom, virtual `device` function. This is necessary to allow extending devices with a `meta` device to also carry along a device so as to compute device propagation in addition to other properties (FakeTensors). Differential Revision: [D36618465](https://our.internmc.facebook.com/intern/diff/D36618465) [ghstack-poisoned]

Summary: Pull Request resolved: #77684 Similar to #77396, add a way to override `device` on Tensor Subclasses by adding a bit to call into a custom, virtual `device` function. This is necessary to allow extending devices with a `meta` device to also carry along a device so as to compute device propagation in addition to other properties (FakeTensors). Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D36618465 Pulled By: eellison fbshipit-source-id: b9dcf480b9e78cbe46c776f1244e7259d871b379

This follows the template in #77396 ghstack-source-id: 9b6a786 Pull Request resolved: #78691

This follows the template in #77396 [ghstack-poisoned]

This follows the template in #77396 ghstack-source-id: c21da4c Pull Request resolved: #78691

This follows the template in #77396 Pull Request resolved: #78691 Approved by: https://github.com/ezyang

Summary: This follows the template in #77396 Pull Request resolved: #78691 Approved by: https://github.com/ezyang Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/22b10873f375cb0d582f4567525238f716c430ca Reviewed By: malfet, b0noI Differential Revision: D36882361 fbshipit-source-id: 67b95f50c40b09514d958526bc6126d314cce7c5

add slow path for is_contiguous

29774c4

[ghstack-poisoned]

george-qi requested review from albanD and soulitzer as code owners May 13, 2022 01:00

facebook-github-bot added the cla signed label May 13, 2022

george-qi added a commit that referenced this pull request May 13, 2022

add slow path for is_contiguous

4c5e22c

ghstack-source-id: 3064fb5 Pull Request resolved: #77396

george-qi requested review from cpuhrsch and ezyang May 13, 2022 01:01

george-qi added the ciflow/trunk Trigger trunk jobs on your pull request label May 13, 2022