ptype propagation on torch functions by dylanbespalko · Pull Request #22235 · pytorch/pytorch

dylanbespalko · 2019-06-25T22:05:34Z

Resolves issue #17249

Added support for python subclass (ptype) propagation when calling torch functions on Tensors (Parameters are unaffected).

I have addressed and responded to the comments on the previous PR. #18610. Please visit this link to see that issues that were resolved. As suggested the changes have been broken in to two PR:

ptype propagation on torch functions.
ptype propagation on numpy ptype propagation on numpy functions #22247

I have also fixed a bug and added tests for ptype propagation during tensor indexing operations.

A simple python script demonstrating ptype propagation is as follows:

import sys
import pickle
import copy
import unittest
from collections import OrderedDict

import numpy as np
from numpy.testing import *

import torch
import torch.multiprocessing as mp


def _rebuild_subclass(type_, data, requires_grad, backward_hooks):
    param = type_(data, requires_grad)
    # NB: This line exists only for backwards compatibility; the
    # general expectation is that backward_hooks is an empty
    # OrderedDict.  See Note [Don't serialize hooks]
    param._backward_hooks = backward_hooks

    return param


class TensorSubclass(torch.Tensor):

    def __new__(cls, data=None, requires_grad=False):
        if data is None:
            data = torch.Tensor()
        self = torch.Tensor._make_subclass(cls, data, requires_grad)
        return self

    def __deepcopy__(self, memo):
        if id(self) in memo:
            return memo[id(self)]
        else:
            result = type(self)(self.data.clone(), self.requires_grad)
            memo[id(self)] = result
            return result

    def __reduce_ex__(self, proto):
        # See Note [Don't serialize hooks]
        return _rebuild_subclass, (self.__class__, self.data, self.requires_grad, OrderedDict())


class A(TensorSubclass):
    pass


class B(TensorSubclass):
    pass


class C(A, B):
    pass


if __name__ == '__main__':
    a_tensor = torch.tensor([1.0, 2.0, 3.0], dtype=torch.double, requires_grad=True)
    b_tensor = torch.tensor([4.0, 5.0, 6.0], dtype=torch.double, requires_grad=True)
    c_tensor = torch.tensor([7.0, 8.0, 9.0], dtype=torch.double, requires_grad=True)
    d_tensor = torch.ones((4, 3, 2), dtype=torch.double, requires_grad=True)
    a = A(a_tensor, requires_grad=True)
    b = B(b_tensor, requires_grad=True)
    c = C(c_tensor, requires_grad=True)
    d = C(d_tensor, requires_grad=True)
    
    print("Subclass")
    print(type(a))
    print(a.requires_grad)
    print(type(b))
    print(b.requires_grad)
    print(type(c))
    print(c.requires_grad)

    print("Torch ptype propagation")
    print(type(a + b))
    print((a + b).requires_grad)
    print(type(a + c))
    print((a + c).requires_grad)

    print("Indexing [None, elipsis, integer, slice")
    print(type(c[None]))
    print(type(c[...]))
    print(type(c[0]))
    print(type(c[slice(0, 1, 1)]))
    print(type(d[None, ..., 0, slice(0, 1, 1)]))

    print("bool")
    print(a == c)

…rch functions on Tensors (Parameters are unaffected).

dylanbespalko · 2019-06-25T22:10:10Z

Hi @ezyang,

I apologize for my absence. I had take some time to pay the bills and port a lot of numpy code to pytorch. Can you please take a look at this PR and the comments I have made on PR. #18610.

dylanbespalko · 2019-06-26T05:54:51Z

The performance difference was profiled on a Nvidia Jetson TX2 (may be slower than most). The results of the Fast RNN benchmarks before/after the changes are shown below:

Namespace(cnns=None, device='cuda', group=['cnns', 'rnns'], hiddenSize=512, inputSize=512, miniBatch=64, nloops=100, numLayers=1, print_json=None, rnns=None, sep=' ', seqLength=100, variable_lstms=False, warmup=10)

Benchmarking LSTMs...
name avg_fwd std_fwd info_fwd avg_bwd std_bwd info_bwd
cudnn 68.82 0.3201 None 149.8 0.02839 None
cudnn 65.13 0.1833 None 149.6 0.4201 None
aten 78.98 0.09888 None 197 0.3931 None
aten 74.96 0.07285 None 194.9 0.2771 None
jit 74.54 0.6153 None 181.2 0.8036 None
jit 71.41 0.3136 None 179.4 0.6142 None
jit_premul 74.53 1.906 None 134.5 0.6629 None
jit_premul 71.37 1.878 None 133.9 0.5102 None
jit_premul_bias 76.46 1.422 None 144.5 1.12 None
jit_premul_bias 74.04 1.76 None 143.5 0.4585 None
jit_simple 73.69 0.3481 None 3.689 0.02428 None
jit_simple 70.41 0.2955 None 3.597 0.02665 None
jit_multilayer 74.22 0.3634 None 192 1.087 None
jit_multilayer 71.14 0.3437 None 190.2 0.8782 None
py 96.09 0.7538 None 215.8 0.7734 None
py 92.25 0.3408 None 209.8 0.6777 None

Benchmarking ResNets...
name avg_fwd std_fwd info_fwd avg_bwd std_bwd info_bwd
resnet18 299.3 0.2684 None 665.9 1.159 None
resnet18 297.1 0.653 None 657.3 1.252 None
resnet18_jit 299.7 0.3312 None 665.1 1.034 None
resnet18_jit 297.3 0.4165 None 656.5 1.3 None
resnet50 854.5 0.5609 None 2396 2.564 None
resnet50 852.9 0.9446 None 2408 3.049 None

There doesn't seem to be any degradation in performance as a result of these changes.

ezyang · 2019-06-26T14:19:10Z

torch/tensor.py

        from torch.autograd._functions import Resize
-        return Resize.apply(self, sizes)
+        new_tensor = Resize.apply(self, sizes)
+        return torch.Tensor._make_subclass(self.__class__, new_tensor.data, new_tensor.requires_grad)


I'm not sure if it actually matters here (since these are deprecated methods on tensor), but the loss of autograd metadata here makes me raise my eyebrows. I haven't reviewed how _make_subclass works yet, but I don't see how you could possibly preserve ResizeBackwards in this case, because you have only passed along the detached tensor and whether or not it requires grad. But that's not all the information that a Variable may have (e.g., grad_fn!)

Yeah, I think I detached by mistake again. I will fix this.

ezyang · 2019-06-26T14:19:30Z

test/test_nn.py

+    def test_ptype_non_propagation(self):
+        p = Parameter(torch.rand((3,), dtype=torch.double, requires_grad=False))
+        self.assertIs((p + 1).__class__, torch.Tensor)
+        self.assertIsNot((p + 1).__class__, Parameter)


ezyang · 2019-06-26T14:20:10Z

test/test_torch.py

        pass


+def _rebuild_subclass(type_, data, requires_grad, backward_hooks):


I guess this code was cargo culted from somewhere? Might be worth saying where it came from.

This was in torch/nn/parameter.py which was the original subclass of Tensor. The comment says it is redundant and it is. I can remove this is you want.

I see the definition has now been moved to torch._utils.py

def _rebuild_parameter(data, requires_grad, backward_hooks): param = torch.nn.Parameter(data, requires_grad) # NB: This line exists only for backwards compatibility; the # general expectation is that backward_hooks is an empty # OrderedDict. See Note [Don't serialize hooks] param._backward_hooks = backward_hooks return param

ezyang · 2019-06-26T14:22:47Z

torch/csrc/autograd/python_variable.cpp

  HANDLE_TH_ERRORS
  static PythonArgParser parser({
-    "_make_subclass(PyObject* cls, Tensor data, bool require_grad=False)",
+    "_make_subclass(PyObject* cls, Tensor data, bool require_grad=False, bool allow_tensor_metadata_change=True)",


~~We need some documentation saying what allow_tensor_metadata_change does~~ I see it is a preexisting parameter.

It was a pre-existing parameter and it would detach the tensor because allow_tensor_metadata_change was set to False. Every tensor would detach when I called make_subclass().

ezyang · 2019-06-26T14:24:40Z

@yf225 could you please review the use of allow_tensor_metadata_change in this patch?

ezyang · 2019-06-26T14:25:33Z

@dylanbespalko could you please give a snippet of the generated code before and after this diff?

EDIT: I guess it's the same as in #18610 (comment) ?

apaszke

This is some good stuff! Not a complete review, and I'd like to take a closer look before this is merged, but may I please request a benchmark which does not use a GPU-bound model? Do something like this:

def slow(x, y):
  for i in range(1000):
    x = x + y
  return x

and run this with x and y containing 1-10 elements. That will highlight the real overhead of this patch.

I'm also somewhat suspicious of the fact that the numbers you report seem to indicate that this patch makes everything faster? That seems veeeery unlikely to me, unless you have a good explanation for that.

apaszke · 2019-06-26T14:29:54Z

torch/csrc/autograd/python_variable.cpp

+    PyObject *obj;
+
+    //Initialize on First Call
+    bool initialized = false;


This should be static!

Also, what about thread safety? Should we guard this with an std::call_once?

This doesn't need to be thread safe because it is in Python code and therefore is GIL protected

But yes, it definitely needs to be static lol

Oops. I'll fix it.

apaszke · 2019-06-26T14:32:06Z

torch/csrc/autograd/python_variable.cpp

+    ptype = (PyTypeObject*)base;
+    Py_INCREF(ptype);
+    // torch.nn.Parameter explicitly do not perform ptype propagation so this should not effect torch.nn modules.
+    if (self != nullptr && PyObject_IsInstance(self, base) && !PyObject_IsInstance(self, parameter) && PyObject_IsInstance(self, (PyObject*)ptype)){


Can we prefer explicit comparisons on o->ob_type instead of IsInstance calls? Those are very slow, and this executed in the hot path.

ezyang · 2019-06-26T14:57:07Z

test/test_torch.py

+        if data is None:
+            data = torch.Tensor()
+        self = torch.Tensor._make_subclass(cls, data, requires_grad)
+        return self


Do you have an explanation for how, in general, you should go about defining subclasses of tensor? I see here that you had to explicitly define __new__; this is a bit magical, so it would be really helpful to know exactly why this is needed, and how the general pattern works.

TensorSubclass in test_torch.py is identical to Parameter in torch.nn.parameter.py. I know you have to define __deepcopy__, __reduce_ex__. I think __reduce_ex__ is doing something for legacy purposes, but I have included it anyways. These are tested in the unit tests.

"you had to explicitly define new; this is a bit magical"

__new__(cls, ...) takes cls (a class object) as the first argument and defines the constructor of a class. You pass the PyTypeObject* cls to THPVariable_make_subclass() which defines the Python (ptype) of the class.

Example:

my_class = MyTensorSubClass # A Class (or type)

__init__(self, ) takes self ( an instance of the class) as the first argument and defines a constructor of an object.

Example:

my_obj = MyTensorSubClass() # An instance of the Class.

The terminology is really weird here because in Python an object, class, function, method, ..., etc are all objects.

ezyang · 2019-06-26T16:38:32Z

test/test_torch.py

+    pass
+
+class C(A, B):
+    pass


It's really interesting that you added support for complicated class hierarchies, but it's not clear to me what good use-cases for this are. Do you have some good descriptions?

One reason I ask about this, is that I am not certain that the correct thing to do is to unconditionally wrap in the most derived subclass. For example, if I think about someone implementing "named" tensors as a tensor subclass, and then someone else separately implementing "debug" tensors as a tensor subclass, returning either a named tensor or a debug tensor as a result of combining the two is unlikely to be a useful operation. In fact, there isn't really any good thing we can do here, we should probably just error.

EDIT: I forgot that you do have examples in the bug report you sent us: #17249

EDIT 2: But actually it doesn't look like there's any nontrivial class hierarchies in your bug report.

I don't need class hierarchies personally. I think I took this directly from how numpy tests overloading numpy methods using __array_ufunc__. I have no preference, but I know this way matches numpy.

Example:

For class C(A, B), numpy tests that result = a + c will call the definition of C.add() which inherits from A.add() and ignores B.add().

I just looked at how I use this. Since there are both Tensors and Parameters, I actually use multiple inheritance and hierarchal inheritance. Here is an example of something that I called TFTensor:

class BaseTensor(th.Tensor): # Used to add new methods to all my tensors at the python level. class TFTensor(base.BaseTensor): # A Tensor where time and frequency are the last two dimensions. def __repr__(self): return '%s(..., %s, %s)\n%s' % (TFTensor.__name__, "t", "f", super(TFTensor, self).__repr__()) # Several other methods are added here class TFParameter(TFTensor, Parameter): # A TFTensor that is used in a neural network. pass

Note: BaseTensor is just a convenience that I use for adding numpy compatible methods when porting code from numpy to pytorch. I might get rid of it soon.

ezyang · 2019-06-26T16:43:05Z

Stepping back a moment, I think we're going to likely need to do some extra design work on this patchset (e.g., hashing out what exactly what the desired semantics of ptype propagation are) before we can land it. Hopefully, we can land the numpy __array_ufunc__ patch first; if it is possible to reorder the patches so ufunc comes first, that's probably better.

EDIT: I have apparently forgotten all the details of the original proposal. Rereading it, I'll come back here.

EDIT 2: Let's discuss this some more at #17249

ezyang · 2019-06-27T20:00:34Z

Hi @dylanbespalko; thank you for all the work you've done on this PR. I don't want to string you along, so let me be explicit: we don't want to accept the version of this patch that only does ptype wrapping. I understand that this is helpful for your use case, but the complexity versus benefit doesn't make sense for PyTorch. What does make sense for PyTorch is a full __torch_function__ implementation (which would subsume this functionality, and also unlock a number of use cases which I've mentioned to you), but this would be a substantial project and involve a lot of close technical design with core. We probably wouldn't be able to directly reuse the code from this PR (although the ideas are very helpful--and we'd be ecstatic to credit you with initial development of this functionality.) Hopefully, you can still make use of this PR for your own projects, and I'm happy to continue to give technical comments on it to help out your use case.

If you are interested in implementing __torch_function__, we'd probably need to set up some sort of high bandwidth channel of communication. @rgommers has also expressed that some Numpy folks might be interested in helping implement this, so that might be another way we eventually resolve this so you don't have to carry this patch yourself. Thanks!

dylanbespalko · 2019-06-27T21:58:06Z

Hi @apaszke,

I apologize for my bogus GPU benchmark. I highly doubt that the performance drawbacks of this PR would show up in that situation.

I have run your benchmark function taking the best time out of 10000:

@time_it
def slow(x, y):
    for i in range(1000):
        x = x + y
    return x

The result indicate there is 1.6% slowdown with isinstance(), and a 0.16% slowdown with self->ob_type checks. I also checked that for tensors with size above 1000 there is no performance degradation at all. Thanks for spotting this, I will make the necessary changes.

dylanbespalko · 2019-06-28T05:08:08Z

@ezyang,

Thanks for your honesty on the likelyhood of this getting approved. Ultimately, passing the decision making to either __tensor_ufunc__ OR __array_ufunc__ creates a few unintended consequences that a unified __tensor_function__ could address. I'm happy to patch my code with the solution I have, but it doesn't solve this problem in the right way.

Please take my recommendations from #17249 when planning your future work. I don't need the credit, just get the result. Please feel free to keep me in the loop, but I have to focus on another problem for a bit.

ezyang · 2019-06-28T13:43:39Z

Excerpts from dylanbespalko's message of 2019-06-27 22:09:17 -0700:

Please take my recommendations from [#17249 ](#17249 (comment)) when planning your future work. I don't need the credit, just get the result. Please feel free to keep me in the loop, but I have to focus on another problem for a bit.

Absolutely, we will definitely keep it in mind!

rgommers · 2019-06-30T04:27:51Z

Ultimately, passing the decision making to either __tensor_ufunc__ OR __array_ufunc__ creates a few unintended consequences that a unified __tensor_function__ could address

@dylanbespalko I've tried to understand this comment and went through all your other PRs, but I'm unclear on what in your definition the difference is between __tensor_ufunc__ and __tensor_function__. Given that PyTorch doesn't have ufuncs, and __array_ufunc__ deals with NumPy ufuncs, what would __tensor_ufunc__ not do that __tensor_function__ would?

dylanbespalko · 2019-07-02T16:45:45Z

@rgommers

My original implementation was done in python where I implemented two functions:
1. __array_ufunc__ (numpy ufunc support)
- THPVariable_result_ptype (determine the most derived python type of the input arguments).
- DO NOT search for overloads of the numpy ufunc because tensors are not np.arrays.
- Cast all input arguments to np.array using .detach().numpy()
- Call the numpy ufunc.
- Cast all outputs to the ptype being propagated.
2. __tensor_ufunc__ (torch ufunc support)
- THPVariable_result_ptype (determine the most derived python type of the input arguments).
- DO NOT search for overloads of the tensor ufunc (for now).
- Up-cast all input arguments to torch.Tensor
- Call the torch ufunc (Call the torch namespace function from the torch.Tensor method)
- Cast all outputs to the ptype being propagated.

I was not able to repeat the same solution in C++ (it may require some C++ code design changes), so I just implemented the ptype propagation in this PR and the __array_ufunc__ in #22247.

Even if I could create __array_ufunc__ AND __tensor_ufunc__ methods of torch.Tensor I still have some issues: eg) np.array + torch.tensor and torch.tensor + np.array do different things. A unified __tensor_function__ based on numpy's __array_function__ could solve this problem.

rgommers · 2019-07-02T21:04:09Z

Thanks @dylanbespalko, I understand what you meant now.

I still have some issues: eg) np.array + torch.tensor and torch.tensor + np.array do different things

That really is unavoidable, this is simply how Python operators work. It's the logic of how __add__ and __radd__ get called, see e.g. https://stackoverflow.com/questions/9126766/addition-between-classes-using-radd-method or https://rszalski.github.io/magicmethods/.

It's best to explicitly cast to either ndarray or Tensor (or subclasses of it) to avoid that difference in behavior.

bdsmith147 · 2019-08-26T18:35:46Z

Hello, I've been following this conversation (and other related ones), and, although I don't understand all of the complications, I'm very interested in this merge request for some work I'm doing to subclass torch.Tensor and implement custom operations for the derived class. Will it be possible to resolve conflicts and merge soon? Thanks!

gchanan · 2019-08-26T18:47:03Z

@bdsmith147 I believe @ezyang's message: #22235 (comment) is canonical here.

dylanbespalko · 2019-08-26T21:20:13Z

@bdsmith147

I am currently using my own patch for tensor subclassing, however it uses a py_instance() check during each tensor operation. This results in overhead that is unacceptable especially when operating on small tensors.

My plan is to patch my own code until the named_tensors feature becomes available. That would at least give me an alternative way to categorize tensors.

ezyang · 2019-08-26T22:03:59Z

^-- @zou3519 you've got a user! :)

facebook-github-bot · 2022-03-29T18:55:13Z

Hi @dylanbespalko!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

github-actions · 2022-05-28T19:35:14Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

Added support for python subclass (ptype) propagation when calling to…

0e29235

…rch functions on Tensors (Parameters are unaffected).

dylanbespalko mentioned this pull request Jun 25, 2019

Add __tensor_wrap__ method similar to numpy __array_wrap__ #17249 #18610

Closed

ezyang added the open source label Jun 25, 2019

yf225 requested review from ezyang and gchanan June 26, 2019 02:19

yf225 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 26, 2019

ezyang changed the title ~~Added support for python subclass propagation torch functions to fix #17249~~ ptype propagation on torch functions Jun 26, 2019

ezyang reviewed Jun 26, 2019

View reviewed changes

apaszke reviewed Jun 26, 2019

View reviewed changes

ezyang reviewed Jun 26, 2019

View reviewed changes

ezyang mentioned this pull request Jun 26, 2019

ptype propagation on numpy functions #22247

Closed

rgommers mentioned this pull request Jul 1, 2019

PyTorch Tensor subclasses and protocols for NumPy interoperability #22402

Open

rgommers mentioned this pull request Nov 20, 2019

Improved Tensor subclassing support, preserving subclasses on function/method calls #28361

Closed

gchanan removed their request for review June 2, 2020 16:48

mruberry removed the module: operators (deprecated) label Oct 10, 2020

github-actions bot added the Stale label May 28, 2022

github-actions bot closed this Jun 27, 2022

		pass


		def _rebuild_subclass(type_, data, requires_grad, backward_hooks):

Conversation

dylanbespalko commented Jun 25, 2019 • edited by ezyang Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dylanbespalko commented Jun 25, 2019

Uh oh!

dylanbespalko commented Jun 26, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang Jun 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang commented Jun 26, 2019

Uh oh!

ezyang commented Jun 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

apaszke left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang Jun 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang commented Jun 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Jun 27, 2019

Uh oh!

dylanbespalko commented Jun 27, 2019

Uh oh!

dylanbespalko commented Jun 28, 2019

Uh oh!

ezyang commented Jun 28, 2019 via email

Uh oh!

rgommers commented Jun 30, 2019

Uh oh!

dylanbespalko commented Jul 2, 2019

Uh oh!

rgommers commented Jul 2, 2019

Uh oh!

bdsmith147 commented Aug 26, 2019

Uh oh!

gchanan commented Aug 26, 2019

dylanbespalko commented Jun 25, 2019 •

edited by ezyang

Loading

ezyang Jun 26, 2019 •

edited

Loading

ezyang commented Jun 26, 2019 •

edited

Loading

ezyang Jun 26, 2019 •

edited

Loading

ezyang commented Jun 26, 2019 •

edited

Loading