Skip to content

ptype propagation on torch functions#22235

Closed
dylanbespalko wants to merge 1 commit intopytorch:masterfrom
dylanbespalko:tensor_subclass
Closed

ptype propagation on torch functions#22235
dylanbespalko wants to merge 1 commit intopytorch:masterfrom
dylanbespalko:tensor_subclass

Conversation

@dylanbespalko
Copy link
Copy Markdown
Contributor

@dylanbespalko dylanbespalko commented Jun 25, 2019

Resolves issue #17249

Added support for python subclass (ptype) propagation when calling torch functions on Tensors (Parameters are unaffected).

I have addressed and responded to the comments on the previous PR. #18610. Please visit this link to see that issues that were resolved. As suggested the changes have been broken in to two PR:

  1. ptype propagation on torch functions.
  2. ptype propagation on numpy ptype propagation on numpy functions #22247

I have also fixed a bug and added tests for ptype propagation during tensor indexing operations.

A simple python script demonstrating ptype propagation is as follows:

import sys
import pickle
import copy
import unittest
from collections import OrderedDict

import numpy as np
from numpy.testing import *

import torch
import torch.multiprocessing as mp


def _rebuild_subclass(type_, data, requires_grad, backward_hooks):
    param = type_(data, requires_grad)
    # NB: This line exists only for backwards compatibility; the
    # general expectation is that backward_hooks is an empty
    # OrderedDict.  See Note [Don't serialize hooks]
    param._backward_hooks = backward_hooks

    return param


class TensorSubclass(torch.Tensor):

    def __new__(cls, data=None, requires_grad=False):
        if data is None:
            data = torch.Tensor()
        self = torch.Tensor._make_subclass(cls, data, requires_grad)
        return self

    def __deepcopy__(self, memo):
        if id(self) in memo:
            return memo[id(self)]
        else:
            result = type(self)(self.data.clone(), self.requires_grad)
            memo[id(self)] = result
            return result

    def __reduce_ex__(self, proto):
        # See Note [Don't serialize hooks]
        return _rebuild_subclass, (self.__class__, self.data, self.requires_grad, OrderedDict())


class A(TensorSubclass):
    pass


class B(TensorSubclass):
    pass


class C(A, B):
    pass


if __name__ == '__main__':
    a_tensor = torch.tensor([1.0, 2.0, 3.0], dtype=torch.double, requires_grad=True)
    b_tensor = torch.tensor([4.0, 5.0, 6.0], dtype=torch.double, requires_grad=True)
    c_tensor = torch.tensor([7.0, 8.0, 9.0], dtype=torch.double, requires_grad=True)
    d_tensor = torch.ones((4, 3, 2), dtype=torch.double, requires_grad=True)
    a = A(a_tensor, requires_grad=True)
    b = B(b_tensor, requires_grad=True)
    c = C(c_tensor, requires_grad=True)
    d = C(d_tensor, requires_grad=True)
    
    print("Subclass")
    print(type(a))
    print(a.requires_grad)
    print(type(b))
    print(b.requires_grad)
    print(type(c))
    print(c.requires_grad)

    print("Torch ptype propagation")
    print(type(a + b))
    print((a + b).requires_grad)
    print(type(a + c))
    print((a + c).requires_grad)

    print("Indexing [None, elipsis, integer, slice")
    print(type(c[None]))
    print(type(c[...]))
    print(type(c[0]))
    print(type(c[slice(0, 1, 1)]))
    print(type(d[None, ..., 0, slice(0, 1, 1)]))

    print("bool")
    print(a == c)

…rch functions on Tensors (Parameters are unaffected).
@pytorchbot pytorchbot added module: autograd Related to torch.autograd, and the autograd engine in general module: internals Related to internal abstractions in c10 and ATen module: nn Related to torch.nn module: numpy Related to numpy support, and also numpy compatibility of our operators module: operators module: pybind Related to our Python bindings / interactions with other Python libraries labels Jun 25, 2019
@dylanbespalko
Copy link
Copy Markdown
Contributor Author

Hi @ezyang,

I apologize for my absence. I had take some time to pay the bills and port a lot of numpy code to pytorch. Can you please take a look at this PR and the comments I have made on PR. #18610.

@yf225 yf225 requested review from ezyang and gchanan June 26, 2019 02:19
@yf225 yf225 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 26, 2019
@dylanbespalko
Copy link
Copy Markdown
Contributor Author

The performance difference was profiled on a Nvidia Jetson TX2 (may be slower than most). The results of the Fast RNN benchmarks before/after the changes are shown below:

Namespace(cnns=None, device='cuda', group=['cnns', 'rnns'], hiddenSize=512, inputSize=512, miniBatch=64, nloops=100, numLayers=1, print_json=None, rnns=None, sep=' ', seqLength=100, variable_lstms=False, warmup=10)

Benchmarking LSTMs...
name avg_fwd std_fwd info_fwd avg_bwd std_bwd info_bwd
cudnn 68.82 0.3201 None 149.8 0.02839 None
cudnn 65.13 0.1833 None 149.6 0.4201 None
aten 78.98 0.09888 None 197 0.3931 None
aten 74.96 0.07285 None 194.9 0.2771 None
jit 74.54 0.6153 None 181.2 0.8036 None
jit 71.41 0.3136 None 179.4 0.6142 None
jit_premul 74.53 1.906 None 134.5 0.6629 None
jit_premul 71.37 1.878 None 133.9 0.5102 None
jit_premul_bias 76.46 1.422 None 144.5 1.12 None
jit_premul_bias 74.04 1.76 None 143.5 0.4585 None
jit_simple 73.69 0.3481 None 3.689 0.02428 None
jit_simple 70.41 0.2955 None 3.597 0.02665 None
jit_multilayer 74.22 0.3634 None 192 1.087 None
jit_multilayer 71.14 0.3437 None 190.2 0.8782 None
py 96.09 0.7538 None 215.8 0.7734 None
py 92.25 0.3408 None 209.8 0.6777 None

Benchmarking ResNets...
name avg_fwd std_fwd info_fwd avg_bwd std_bwd info_bwd
resnet18 299.3 0.2684 None 665.9 1.159 None
resnet18 297.1 0.653 None 657.3 1.252 None
resnet18_jit 299.7 0.3312 None 665.1 1.034 None
resnet18_jit 297.3 0.4165 None 656.5 1.3 None
resnet50 854.5 0.5609 None 2396 2.564 None
resnet50 852.9 0.9446 None 2408 3.049 None

There doesn't seem to be any degradation in performance as a result of these changes.

@ezyang ezyang changed the title Added support for python subclass propagation torch functions to fix #17249 ptype propagation on torch functions Jun 26, 2019
from torch.autograd._functions import Resize
return Resize.apply(self, sizes)
new_tensor = Resize.apply(self, sizes)
return torch.Tensor._make_subclass(self.__class__, new_tensor.data, new_tensor.requires_grad)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it actually matters here (since these are deprecated methods on tensor), but the loss of autograd metadata here makes me raise my eyebrows. I haven't reviewed how _make_subclass works yet, but I don't see how you could possibly preserve ResizeBackwards in this case, because you have only passed along the detached tensor and whether or not it requires grad. But that's not all the information that a Variable may have (e.g., grad_fn!)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think I detached by mistake again. I will fix this.

def test_ptype_non_propagation(self):
p = Parameter(torch.rand((3,), dtype=torch.double, requires_grad=False))
self.assertIs((p + 1).__class__, torch.Tensor)
self.assertIsNot((p + 1).__class__, Parameter)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.

pass


def _rebuild_subclass(type_, data, requires_grad, backward_hooks):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this code was cargo culted from somewhere? Might be worth saying where it came from.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was in torch/nn/parameter.py which was the original subclass of Tensor. The comment says it is redundant and it is. I can remove this is you want.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the definition has now been moved to torch._utils.py

def _rebuild_parameter(data, requires_grad, backward_hooks):
    param = torch.nn.Parameter(data, requires_grad)
    # NB: This line exists only for backwards compatibility; the
    # general expectation is that backward_hooks is an empty
    # OrderedDict.  See Note [Don't serialize hooks]
    param._backward_hooks = backward_hooks

    return param

HANDLE_TH_ERRORS
static PythonArgParser parser({
"_make_subclass(PyObject* cls, Tensor data, bool require_grad=False)",
"_make_subclass(PyObject* cls, Tensor data, bool require_grad=False, bool allow_tensor_metadata_change=True)",
Copy link
Copy Markdown
Contributor

@ezyang ezyang Jun 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need some documentation saying what allow_tensor_metadata_change does I see it is a preexisting parameter.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a pre-existing parameter and it would detach the tensor because allow_tensor_metadata_change was set to False. Every tensor would detach when I called make_subclass().

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Jun 26, 2019

@yf225 could you please review the use of allow_tensor_metadata_change in this patch?

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Jun 26, 2019

@dylanbespalko could you please give a snippet of the generated code before and after this diff?

EDIT: I guess it's the same as in #18610 (comment) ?

Copy link
Copy Markdown
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is some good stuff! Not a complete review, and I'd like to take a closer look before this is merged, but may I please request a benchmark which does not use a GPU-bound model? Do something like this:

def slow(x, y):
  for i in range(1000):
    x = x + y
  return x

and run this with x and y containing 1-10 elements. That will highlight the real overhead of this patch.

I'm also somewhat suspicious of the fact that the numbers you report seem to indicate that this patch makes everything faster? That seems veeeery unlikely to me, unless you have a good explanation for that.

PyObject *obj;

//Initialize on First Call
bool initialized = false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be static!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what about thread safety? Should we guard this with an std::call_once?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be thread safe because it is in Python code and therefore is GIL protected

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yes, it definitely needs to be static lol

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I'll fix it.

ptype = (PyTypeObject*)base;
Py_INCREF(ptype);
// torch.nn.Parameter explicitly do not perform ptype propagation so this should not effect torch.nn modules.
if (self != nullptr && PyObject_IsInstance(self, base) && !PyObject_IsInstance(self, parameter) && PyObject_IsInstance(self, (PyObject*)ptype)){
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we prefer explicit comparisons on o->ob_type instead of IsInstance calls? Those are very slow, and this executed in the hot path.

if data is None:
data = torch.Tensor()
self = torch.Tensor._make_subclass(cls, data, requires_grad)
return self
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an explanation for how, in general, you should go about defining subclasses of tensor? I see here that you had to explicitly define __new__; this is a bit magical, so it would be really helpful to know exactly why this is needed, and how the general pattern works.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TensorSubclass in test_torch.py is identical to Parameter in torch.nn.parameter.py. I know you have to define __deepcopy__, __reduce_ex__. I think __reduce_ex__ is doing something for legacy purposes, but I have included it anyways. These are tested in the unit tests.

"you had to explicitly define new; this is a bit magical"

__new__(cls, ...) takes cls (a class object) as the first argument and defines the constructor of a class. You pass the PyTypeObject* cls to THPVariable_make_subclass() which defines the Python (ptype) of the class.

Example:

my_class = MyTensorSubClass  # A Class (or type)

__init__(self, ) takes self ( an instance of the class) as the first argument and defines a constructor of an object.

Example:

my_obj = MyTensorSubClass()  # An instance of the Class.

The terminology is really weird here because in Python an object, class, function, method, ..., etc are all objects.

pass

class C(A, B):
pass
Copy link
Copy Markdown
Contributor

@ezyang ezyang Jun 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really interesting that you added support for complicated class hierarchies, but it's not clear to me what good use-cases for this are. Do you have some good descriptions?

One reason I ask about this, is that I am not certain that the correct thing to do is to unconditionally wrap in the most derived subclass. For example, if I think about someone implementing "named" tensors as a tensor subclass, and then someone else separately implementing "debug" tensors as a tensor subclass, returning either a named tensor or a debug tensor as a result of combining the two is unlikely to be a useful operation. In fact, there isn't really any good thing we can do here, we should probably just error.

EDIT: I forgot that you do have examples in the bug report you sent us: #17249

EDIT 2: But actually it doesn't look like there's any nontrivial class hierarchies in your bug report.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't need class hierarchies personally. I think I took this directly from how numpy tests overloading numpy methods using __array_ufunc__. I have no preference, but I know this way matches numpy.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example:

For class C(A, B), numpy tests that result = a + c will call the definition of C.add() which inherits from A.add() and ignores B.add().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just looked at how I use this. Since there are both Tensors and Parameters, I actually use multiple inheritance and hierarchal inheritance. Here is an example of something that I called TFTensor:

class BaseTensor(th.Tensor): # Used to add new methods to all my tensors at the python level.

class TFTensor(base.BaseTensor): # A Tensor where time and frequency are the last two dimensions.

    def __repr__(self):
        return '%s(..., %s, %s)\n%s' % (TFTensor.__name__, "t", "f", super(TFTensor, self).__repr__())

   # Several other methods are added here
 
class TFParameter(TFTensor, Parameter): # A TFTensor that is used in a neural network.
    pass

Note: BaseTensor is just a convenience that I use for adding numpy compatible methods when porting code from numpy to pytorch. I might get rid of it soon.

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Jun 26, 2019

Stepping back a moment, I think we're going to likely need to do some extra design work on this patchset (e.g., hashing out what exactly what the desired semantics of ptype propagation are) before we can land it. Hopefully, we can land the numpy __array_ufunc__ patch first; if it is possible to reorder the patches so ufunc comes first, that's probably better.

EDIT: I have apparently forgotten all the details of the original proposal. Rereading it, I'll come back here.

EDIT 2: Let's discuss this some more at #17249

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Jun 27, 2019

Hi @dylanbespalko; thank you for all the work you've done on this PR. I don't want to string you along, so let me be explicit: we don't want to accept the version of this patch that only does ptype wrapping. I understand that this is helpful for your use case, but the complexity versus benefit doesn't make sense for PyTorch. What does make sense for PyTorch is a full __torch_function__ implementation (which would subsume this functionality, and also unlock a number of use cases which I've mentioned to you), but this would be a substantial project and involve a lot of close technical design with core. We probably wouldn't be able to directly reuse the code from this PR (although the ideas are very helpful--and we'd be ecstatic to credit you with initial development of this functionality.) Hopefully, you can still make use of this PR for your own projects, and I'm happy to continue to give technical comments on it to help out your use case.

If you are interested in implementing __torch_function__, we'd probably need to set up some sort of high bandwidth channel of communication. @rgommers has also expressed that some Numpy folks might be interested in helping implement this, so that might be another way we eventually resolve this so you don't have to carry this patch yourself. Thanks!

@dylanbespalko
Copy link
Copy Markdown
Contributor Author

Hi @apaszke,

I apologize for my bogus GPU benchmark. I highly doubt that the performance drawbacks of this PR would show up in that situation.

I have run your benchmark function taking the best time out of 10000:

@time_it
def slow(x, y):
    for i in range(1000):
        x = x + y
    return x

The result indicate there is 1.6% slowdown with isinstance(), and a 0.16% slowdown with self->ob_type checks. I also checked that for tensors with size above 1000 there is no performance degradation at all. Thanks for spotting this, I will make the necessary changes.

Profile

@dylanbespalko
Copy link
Copy Markdown
Contributor Author

@ezyang,

Thanks for your honesty on the likelyhood of this getting approved. Ultimately, passing the decision making to either __tensor_ufunc__ OR __array_ufunc__ creates a few unintended consequences that a unified __tensor_function__ could address. I'm happy to patch my code with the solution I have, but it doesn't solve this problem in the right way.

Please take my recommendations from #17249 when planning your future work. I don't need the credit, just get the result. Please feel free to keep me in the loop, but I have to focus on another problem for a bit.

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Jun 28, 2019 via email

@rgommers
Copy link
Copy Markdown
Collaborator

Ultimately, passing the decision making to either __tensor_ufunc__ OR __array_ufunc__ creates a few unintended consequences that a unified __tensor_function__ could address

@dylanbespalko I've tried to understand this comment and went through all your other PRs, but I'm unclear on what in your definition the difference is between __tensor_ufunc__ and __tensor_function__. Given that PyTorch doesn't have ufuncs, and __array_ufunc__ deals with NumPy ufuncs, what would __tensor_ufunc__ not do that __tensor_function__ would?

@dylanbespalko
Copy link
Copy Markdown
Contributor Author

@rgommers

My original implementation was done in python where I implemented two functions:
1. __array_ufunc__ (numpy ufunc support)
- THPVariable_result_ptype (determine the most derived python type of the input arguments).
- DO NOT search for overloads of the numpy ufunc because tensors are not np.arrays.
- Cast all input arguments to np.array using .detach().numpy()
- Call the numpy ufunc.
- Cast all outputs to the ptype being propagated.
2. __tensor_ufunc__ (torch ufunc support)
- THPVariable_result_ptype (determine the most derived python type of the input arguments).
- DO NOT search for overloads of the tensor ufunc (for now).
- Up-cast all input arguments to torch.Tensor
- Call the torch ufunc (Call the torch namespace function from the torch.Tensor method)
- Cast all outputs to the ptype being propagated.

I was not able to repeat the same solution in C++ (it may require some C++ code design changes), so I just implemented the ptype propagation in this PR and the __array_ufunc__ in #22247.

Even if I could create __array_ufunc__ AND __tensor_ufunc__ methods of torch.Tensor I still have some issues: eg) np.array + torch.tensor and torch.tensor + np.array do different things. A unified __tensor_function__ based on numpy's __array_function__ could solve this problem.

@rgommers
Copy link
Copy Markdown
Collaborator

rgommers commented Jul 2, 2019

Thanks @dylanbespalko, I understand what you meant now.

I still have some issues: eg) np.array + torch.tensor and torch.tensor + np.array do different things

That really is unavoidable, this is simply how Python operators work. It's the logic of how __add__ and __radd__ get called, see e.g. https://stackoverflow.com/questions/9126766/addition-between-classes-using-radd-method or https://rszalski.github.io/magicmethods/.

It's best to explicitly cast to either ndarray or Tensor (or subclasses of it) to avoid that difference in behavior.

@bdsmith147
Copy link
Copy Markdown

Hello, I've been following this conversation (and other related ones), and, although I don't understand all of the complications, I'm very interested in this merge request for some work I'm doing to subclass torch.Tensor and implement custom operations for the derived class. Will it be possible to resolve conflicts and merge soon? Thanks!

@gchanan
Copy link
Copy Markdown
Contributor

gchanan commented Aug 26, 2019

@bdsmith147 I believe @ezyang's message: #22235 (comment) is canonical here.

@dylanbespalko
Copy link
Copy Markdown
Contributor Author

@bdsmith147

I am currently using my own patch for tensor subclassing, however it uses a py_instance() check during each tensor operation. This results in overhead that is unacceptable especially when operating on small tensors.

My plan is to patch my own code until the named_tensors feature becomes available. That would at least give me an alternative way to categorize tensors.

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Aug 26, 2019

^-- @zou3519 you've got a user! :)

@facebook-github-bot
Copy link
Copy Markdown
Contributor

Hi @dylanbespalko!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@github-actions
Copy link
Copy Markdown
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label May 28, 2022
@github-actions github-actions bot closed this Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: autograd Related to torch.autograd, and the autograd engine in general module: internals Related to internal abstractions in c10 and ATen module: nn Related to torch.nn module: numpy Related to numpy support, and also numpy compatibility of our operators module: pybind Related to our Python bindings / interactions with other Python libraries open source Stale triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants