Skip to content

add slow path for is_contiguous#77396

Closed
george-qi wants to merge 9 commits intogh/george-qi/33/basefrom
gh/george-qi/33/head
Closed

add slow path for is_contiguous#77396
george-qi wants to merge 9 commits intogh/george-qi/33/basefrom
gh/george-qi/33/head

Conversation

@george-qi
Copy link
Contributor

@george-qi george-qi commented May 13, 2022

Stack from ghstack:

Implementation is done by roughly following the template from Issue 65423

  • Adding an argument uses_python_dispatch (defaulted to False) to _make_subclass and _make_wrapper_subclass
  • If True, then I'll set_python_dispatch(true) and set sizes_strides_policy_ to SizesStridesPolicy::CustomStrides
  • In is_contiguous, if both of the above are true, then I'll call the new method on PyInterpreter

This should avoid slowing down the fast path of is_contiguous.

I don't have a ton of context around python_dispatch, so please let me know if this is the intended use. The other way I imagined doing this is adding another enum to SizesStridesPolicy, like PythonCustomStrides, and then instead of checking both is_python_dispatch() and sizes_strides_policy_ >= SizesStridesPolicy::CustomStrides, I would just check sizes_strides_policy_ == SizesStridesPolicy::PythonCustomStrides

Differential Revision: D36527481

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented May 13, 2022

🔗 Helpful links

✅ No Failures (50 Pending)

As of commit 3d103ba (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

george-qi added a commit that referenced this pull request May 13, 2022
ghstack-source-id: 3064fb5
Pull Request resolved: #77396
@george-qi george-qi requested review from cpuhrsch and ezyang May 13, 2022 01:01
@george-qi george-qi added the ciflow/trunk Trigger trunk jobs on your pull request label May 13, 2022
@ezyang
Copy link
Contributor

ezyang commented May 13, 2022

Looking over the implementation it looks like my outline was incomplete (also the outline predates policy consolidation), so there are a few more things to hit.

  • After policy consolidation, CustomStrides also causes strides() to be virtualized. Strides will then hit strides_custom which will raise an error, as this is the default implementation. So if we do this, we also have to make strides overrideable by Python in the same way.
  • This leads to a second problem, which is that once you flip on CustomStrides, there is literally no way to get at the underlying strides field on the TensorImpl object. Ordinarily, as a TensorImpl subclass you would have access to the protected strides_default method, but this is not an option for Python dispatch. It would be good if super().strides() did the right thing but I'm not exactly sure how to go about doing this. Maybe this is not a big deal as the point is you're opting out of the built in data.
  • It would be better not to put the is_python_dispatch test directly inline in the is_contiguous method, as it is called from a lot of places and by putting it in the inline test you're forcing a bunch of extra cold code at all of the call sites, bloating icache. What is less certain is how exactly we should move it out of line. The minimal change is to place it inside the default virtual method implementation itself; this is cheap and easy to do but if a TensorImpl subclass overrides that virtual method it would be easy to forget to dispatch to Python if necessary and then this codepath would go dark (depending on the use case, though, this might not actually be a problem) If we need to guarantee this path gets hit for a subclass, no matter the TensorImpl, we could revive the non-virtual method which checks if python_dispatch before deciding what to do.

This also leaves me a question which is whether or not this is the best implementation strategy. An alternative implementation strategy would have been to add is_contiguous to native_functions.yaml, and then use manual_cpp_binding to maintain this method in TensorBase

  bool is_contiguous(at::MemoryFormat memory_format=at::MemoryFormat::Contiguous) const {
    return impl_->is_contiguous(memory_format);
  }

but have a test for "is Python subclass" here. But there are benefits to the approach in this PR: it is faster for the common case (since we only do one branch, not two) and it avoids having to deal with TorchScript related fallout from them having defined is_contiguous directly in TS primitives. So let's leave it as is.

@ezyang ezyang requested a review from suo May 13, 2022 18:52
Implementation is done by roughly following the template from [Issue 65423](#65423)
- Adding an argument `uses_python_dispatch` (defaulted to False) to `_make_subclass` and `_make_wrapper_subclass`
- If True, then I'll `set_python_dispatch(true)` and set `sizes_strides_policy_` to `SizesStridesPolicy::CustomStrides`
- In `is_contiguous`, if both of the above are true, then I'll call the new method on PyInterpreter

This should avoid slowing down the fast path of `is_contiguous`.

I don't have a ton of context around `python_dispatch`, so please let me know if this is the intended use. The other way I imagined doing this is adding another enum to `SizesStridesPolicy`, like `PythonCustomStrides`, and then instead of checking both `is_python_dispatch()` and `sizes_strides_policy_ >= SizesStridesPolicy::CustomStrides`, I would just check `sizes_strides_policy_ == SizesStridesPolicy::PythonCustomStrides`

[ghstack-poisoned]
Implementation is done by roughly following the template from [Issue 65423](#65423)
- Adding an argument `uses_python_dispatch` (defaulted to False) to `_make_subclass` and `_make_wrapper_subclass`
- If True, then I'll `set_python_dispatch(true)` and set `sizes_strides_policy_` to `SizesStridesPolicy::CustomStrides`
- In `is_contiguous`, if both of the above are true, then I'll call the new method on PyInterpreter

This should avoid slowing down the fast path of `is_contiguous`.

I don't have a ton of context around `python_dispatch`, so please let me know if this is the intended use. The other way I imagined doing this is adding another enum to `SizesStridesPolicy`, like `PythonCustomStrides`, and then instead of checking both `is_python_dispatch()` and `sizes_strides_policy_ >= SizesStridesPolicy::CustomStrides`, I would just check `sizes_strides_policy_ == SizesStridesPolicy::PythonCustomStrides`

[ghstack-poisoned]
george-qi added a commit that referenced this pull request May 13, 2022
ghstack-source-id: 205b88b
Pull Request resolved: #77396
@george-qi
Copy link
Contributor Author

Thanks for the review and discussion in the meeting, super helpful!

  • After policy consolidation, CustomStrides also causes strides() to be virtualized. Strides will then hit strides_custom which will raise an error, as this is the default implementation. So if we do this, we also have to make strides overrideable by Python in the same way.

As discussed in the meeting, I'm currently limiting this diff to is_contiguous and the next to contiguous and punting on the others for now. Let me know if you feel strongly about supporting any of the other operations

  • This leads to a second problem, which is that once you flip on CustomStrides, there is literally no way to get at the underlying strides field on the TensorImpl object. Ordinarily, as a TensorImpl subclass you would have access to the protected strides_default method, but this is not an option for Python dispatch. It would be good if super().strides() did the right thing but I'm not exactly sure how to go about doing this. Maybe this is not a big deal as the point is you're opting out of the built in data.

Good call, and as you point out, for us, not having the underlying strides field is indeed okay for us

  • It would be better not to put the is_python_dispatch test directly inline in the is_contiguous method, as it is called from a lot of places and by putting it in the inline test you're forcing a bunch of extra cold code at all of the call sites, bloating icache. What is less certain is how exactly we should move it out of line. The minimal change is to place it inside the default virtual method implementation itself; this is cheap and easy to do but if a TensorImpl subclass overrides that virtual method it would be easy to forget to dispatch to Python if necessary and then this codepath would go dark (depending on the use case, though, this might not actually be a problem) If we need to guarantee this path gets hit for a subclass, no matter the TensorImpl, we could revive the non-virtual method which checks if python_dispatch before deciding what to do.

Thanks for the explanation and the learnings :) After a quick discussion with Christian, I've changed it the minimal change -- that is, putting it in the virtual method and leaving this up to the devs of tensor subclasses to add the Python dispatch if they'd like.

Please let me know if there are any other outstanding things!

Implementation is done by roughly following the template from [Issue 65423](#65423)
- Adding an argument `uses_python_dispatch` (defaulted to False) to `_make_subclass` and `_make_wrapper_subclass`
- If True, then I'll `set_python_dispatch(true)` and set `sizes_strides_policy_` to `SizesStridesPolicy::CustomStrides`
- In `is_contiguous`, if both of the above are true, then I'll call the new method on PyInterpreter

This should avoid slowing down the fast path of `is_contiguous`.

I don't have a ton of context around `python_dispatch`, so please let me know if this is the intended use. The other way I imagined doing this is adding another enum to `SizesStridesPolicy`, like `PythonCustomStrides`, and then instead of checking both `is_python_dispatch()` and `sizes_strides_policy_ >= SizesStridesPolicy::CustomStrides`, I would just check `sizes_strides_policy_ == SizesStridesPolicy::PythonCustomStrides`

[ghstack-poisoned]
george-qi added a commit that referenced this pull request May 16, 2022
ghstack-source-id: f23c099
Pull Request resolved: #77396
Implementation is done by roughly following the template from [Issue 65423](#65423)
- Adding an argument `uses_python_dispatch` (defaulted to False) to `_make_subclass` and `_make_wrapper_subclass`
- If True, then I'll `set_python_dispatch(true)` and set `sizes_strides_policy_` to `SizesStridesPolicy::CustomStrides`
- In `is_contiguous`, if both of the above are true, then I'll call the new method on PyInterpreter

This should avoid slowing down the fast path of `is_contiguous`.

I don't have a ton of context around `python_dispatch`, so please let me know if this is the intended use. The other way I imagined doing this is adding another enum to `SizesStridesPolicy`, like `PythonCustomStrides`, and then instead of checking both `is_python_dispatch()` and `sizes_strides_policy_ >= SizesStridesPolicy::CustomStrides`, I would just check `sizes_strides_policy_ == SizesStridesPolicy::PythonCustomStrides`

[ghstack-poisoned]
george-qi added a commit that referenced this pull request May 17, 2022
ghstack-source-id: c302260
Pull Request resolved: #77396
pytorchmergebot added a commit that referenced this pull request May 19, 2022
@malfet
Copy link
Contributor

malfet commented May 19, 2022

By the way, look like this should have been marked as BC breaking?
Also, this failure (at least on compilation level) could have been easily avoided by adding default value to new argument

@albanD
Copy link
Collaborator

albanD commented May 19, 2022

By the way, look like this should have been marked as BC breaking?

I don't think so, torchdistx is using private APIs. So this is not BC breaking.

@malfet
Copy link
Contributor

malfet commented May 19, 2022

By the way, look like this should have been marked as BC breaking?

I don't think so, torchdistx is using private APIs. So this is not BC breaking.

Hmm, are all APIs in C10 consider to be private?

@george-qi
Copy link
Contributor Author

@george-qi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@albanD
Copy link
Collaborator

albanD commented May 19, 2022

Hmm, are all APIs in C10 consider to be private?

That's a good question. At least our FC/BC guarantees don't apply to them. Just like we're "breaking" aten APIs all the time without warnings.

@george-qi george-qi reopened this May 19, 2022
@george-qi george-qi closed this May 19, 2022
@george-qi george-qi reopened this May 19, 2022
facebook-github-bot pushed a commit that referenced this pull request May 19, 2022
Summary:
Pull Request resolved: #77906

This is a re-landing of #77396 with a patch to torchdistx so that internal CI's don't break

Test Plan: CI

Reviewed By: malfet, b0noI

Differential Revision: D36493890

fbshipit-source-id: 8373f0bfa3803d2c8bc6dd5178b13205c39ea1f0
eellison pushed a commit that referenced this pull request May 20, 2022
Similar to #77396, add a way to override `device` on Tensor Subclasses by adding a bit to call into a custom, virtual `device` function. This is necessary to allow extending devices with a `meta` device to also carry along a device so as to compute device propagation in addition to other properties (FakeTensors).

[ghstack-poisoned]
eellison pushed a commit that referenced this pull request May 20, 2022
Similar to #77396, add a way to override `device` on Tensor Subclasses by adding a bit to call into a custom, virtual `device` function. This is necessary to allow extending devices with a `meta` device to also carry along a device so as to compute device propagation in addition to other properties (FakeTensors).

[ghstack-poisoned]
eellison pushed a commit that referenced this pull request May 20, 2022
Similar to #77396, add a way to override `device` on Tensor Subclasses by adding a bit to call into a custom, virtual `device` function. This is necessary to allow extending devices with a `meta` device to also carry along a device so as to compute device propagation in addition to other properties (FakeTensors).

[ghstack-poisoned]
eellison pushed a commit that referenced this pull request May 24, 2022
Similar to #77396, add a way to override `device` on Tensor Subclasses by adding a bit to call into a custom, virtual `device` function. This is necessary to allow extending devices with a `meta` device to also carry along a device so as to compute device propagation in addition to other properties (FakeTensors).

[ghstack-poisoned]
eellison pushed a commit that referenced this pull request May 24, 2022
Similar to #77396, add a way to override `device` on Tensor Subclasses by adding a bit to call into a custom, virtual `device` function. This is necessary to allow extending devices with a `meta` device to also carry along a device so as to compute device propagation in addition to other properties (FakeTensors).

[ghstack-poisoned]
eellison pushed a commit that referenced this pull request May 24, 2022
Similar to #77396, add a way to override `device` on Tensor Subclasses by adding a bit to call into a custom, virtual `device` function. This is necessary to allow extending devices with a `meta` device to also carry along a device so as to compute device propagation in addition to other properties (FakeTensors).

Differential Revision: [D36618465](https://our.internmc.facebook.com/intern/diff/D36618465)

[ghstack-poisoned]
eellison pushed a commit that referenced this pull request May 24, 2022
Similar to #77396, add a way to override `device` on Tensor Subclasses by adding a bit to call into a custom, virtual `device` function. This is necessary to allow extending devices with a `meta` device to also carry along a device so as to compute device propagation in addition to other properties (FakeTensors).

Differential Revision: [D36618465](https://our.internmc.facebook.com/intern/diff/D36618465)

[ghstack-poisoned]
facebook-github-bot pushed a commit that referenced this pull request May 24, 2022
Summary:
Pull Request resolved: #77684

Similar to #77396, add a way to override `device` on Tensor Subclasses by adding a bit to call into a custom, virtual `device` function. This is necessary to allow extending devices with a `meta` device to also carry along a device so as to compute device propagation in addition to other properties (FakeTensors).

Test Plan: Imported from OSS

Reviewed By: Chillee

Differential Revision: D36618465

Pulled By: eellison

fbshipit-source-id: b9dcf480b9e78cbe46c776f1244e7259d871b379
@george-qi george-qi closed this May 25, 2022
@facebook-github-bot facebook-github-bot deleted the gh/george-qi/33/head branch May 29, 2022 14:17
suo added a commit that referenced this pull request Jun 2, 2022
This follows the template in
#77396

ghstack-source-id: 9b6a786
Pull Request resolved: #78691
suo added a commit that referenced this pull request Jun 2, 2022
This follows the template in
#77396

[ghstack-poisoned]
suo added a commit that referenced this pull request Jun 2, 2022
This follows the template in
#77396

[ghstack-poisoned]
suo added a commit that referenced this pull request Jun 2, 2022
This follows the template in
#77396

[ghstack-poisoned]
suo added a commit that referenced this pull request Jun 2, 2022
This follows the template in
#77396

ghstack-source-id: c21da4c
Pull Request resolved: #78691
pytorchmergebot pushed a commit that referenced this pull request Jun 2, 2022
This follows the template in
#77396

Pull Request resolved: #78691

Approved by: https://github.com/ezyang
facebook-github-bot pushed a commit that referenced this pull request Jun 3, 2022
Summary:
This follows the template in
#77396

Pull Request resolved: #78691

Approved by: https://github.com/ezyang

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/22b10873f375cb0d582f4567525238f716c430ca

Reviewed By: malfet, b0noI

Differential Revision: D36882361

fbshipit-source-id: 67b95f50c40b09514d958526bc6126d314cce7c5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request cla signed Merged release notes: python_frontend python frontend release notes category Reverted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants