Add __tensor_wrap__ method similar to numpy __array_wrap__ #17249 by dylanbespalko · Pull Request #18610 · pytorch/pytorch

dylanbespalko · 2019-03-29T05:08:15Z

Added numpy array_ufunc for Tensor python type propagation during numpy function calls.
Added result_ptype for Tensor python type propagation during torch function calls.
Added workaround so that torch.nn.Parameters do not have python type propagation.
Added test_ptype_propagation to test/test_torch.py
Added test_ptype_non_propagation to test/test_torch.py

Example

class A(th.Tensor):
...
class B(th.Tensor):
....
class C(A, B):
...

a = A(th.ones((1,), dtype=th.double, requires_grad=False)) -> A
c = C(th.ones((1,), dtype=th.double, requires_grad=False)) -> C
a + c -> C

p = th.nn.Parameter(th.randn(5, 5))
print(p+1) -> Tensor

…lity with numpy ufuncs.

…r torch.nn tests.

…d so that torch.nn remains unaffected by type propagation.

…n type propagation using numpy __array_ufunc__ behaviour) - Includes __array_ufunc__ for python type propagation during numpy function calls - result_ptype determine propagated subclass of Tensor during torch function calls.

# Conflicts: # test/test_torch.py

ezyang · 2019-04-05T02:02:31Z

Could you post what the autogenerated code looks like before and after the change? (Just for one sample, representative function.)

dylanbespalko · 2019-04-05T20:27:35Z

Hi @ezyang,

Here is a method that uses both the static namedtuple and dynamic ptype subclass return type. I have to dynamically check my python type on each call because the subclass type is determined at runtime.

before:python_variable_methods.cpp

static PyObject * THPVariable_max(PyObject* self_, PyObject* args, PyObject* kwargs)
{
  HANDLE_TH_ERRORS
  static PythonArgParser parser({
    "max()",
    "max(Tensor other)",
    "max(int64_t dim, bool keepdim=False)",
  }, /*traceable=*/true);
  auto& self = reinterpret_cast<THPVariable*>(self_)->cdata;
  ParsedArgs<3> parsed_args;
  auto r = parser.parse(args, kwargs, parsed_args);

  static PyStructSequence_Field fields0[] = {
    {"values", ""}, {"indices", ""}, {nullptr}
  };
  static PyStructSequence_Desc desc0 = {
    "torch.return_types.max", nullptr,
    fields0, 2
  };
  static PyTypeObject type0;
  static bool namedtuple_type_initialized0 = false;
  if (!namedtuple_type_initialized0) {
    PyStructSequence_InitType(&type0, &desc0);
    type0.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
    namedtuple_type_initialized0 = true;
  }
  if (r.idx == 0) {
    return wrap(dispatch_max(self));
  } else if (r.idx == 1) {
    return wrap(dispatch_max(self, r.tensor(0)));
  } else if (r.idx == 2) {
    return wrap(&type0, dispatch_max(self, r.toInt64(0), r.toBool(1)));
  }
  Py_RETURN_NONE;
  END_HANDLE_TH_ERRORS
}

after:python_variable_methods.cpp

static PyObject * THPVariable_max(PyObject* self_, PyObject* args, PyObject* kwargs)
{
  HANDLE_TH_ERRORS
  static PythonArgParser parser({
    "max()",
    "max(Tensor other)",
    "max(int64_t dim, bool keepdim=False)",
  }, /*traceable=*/true);
  auto& self = reinterpret_cast<THPVariable*>(self_)->cdata;
  ParsedArgs<3> parsed_args;
  auto r = parser.parse(args, kwargs, parsed_args);
  auto *ptype = THPVariable_result_ptype(self_, args); // NEW!
  static PyStructSequence_Field fields0[] = {
    {"values", ""}, {"indices", ""}, {nullptr}
  };
  static PyStructSequence_Desc desc0 = {
    "torch.return_types.max", nullptr,
    fields0, 2
  };
  static PyTypeObject type0;
  static bool namedtuple_type_initialized0 = false;
  if (!namedtuple_type_initialized0) {
    PyStructSequence_InitType(&type0, &desc0);
    type0.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
    namedtuple_type_initialized0 = true;
  }
  if (r.idx == 0) {
    return wrap(ptype, dispatch_max(self)); // NEW ARGUMENT!
  } else if (r.idx == 1) {
    return wrap(ptype, dispatch_max(self, r.tensor(0)));
  } else if (r.idx == 2) {
    return wrap(&type0, dispatch_max(self, r.toInt64(0), r.toBool(1)));
  }
  Py_RETURN_NONE;
  END_HANDLE_TH_ERRORS
}

Aside:
I keep trying to pull the latest changes from master, but it's been a week since I could pull something that compiles and passes the unit tests. I also could not use your ghstack module because it won't work on a fork of the repository (I don't have contributor access to pytorch). I will keep pulling the changes from master until it works.

apaszke · 2019-04-07T17:30:09Z

Also, we should carefully benchmark this change before it goes it.

ezyang · 2019-04-08T02:23:18Z

I keep trying to pull the latest changes from master, but it's been a week since I could pull something that compiles and passes the unit tests

Ah, that's because you have a bunch of spurious submodule changes in your PR. If you revert them that will solve your problem, e.g., as per https://stackoverflow.com/questions/7718780/how-to-undo-git-submodule-update

ezyang · 2019-04-09T19:07:21Z

@dylanbespalko Also, I'll get you setup with ghstack access shortly

ezyang · 2019-04-10T02:23:34Z

torch/csrc/autograd/python_variable.cpp

+
+    ptype = (PyTypeObject*)base;
+    Py_INCREF(ptype);
+    if (self != NULL && PyObject_IsInstance(self, base) && !PyObject_IsInstance(self, parameter) && PyObject_IsInstance(self, (PyObject*)ptype)){


I expected to see some sort of fast path here for the case of Tensor and Parameter. Also, I see you have special-cased Parameter here, and it would be good if it were described what exactly this special case is doing (and why we actually need to special-case Parameter in this way; it's not obvious to me that Parameter should be special-cased here.)

For example, the Numpy __array_ufunc__ has a special case for the case when the ufunc is identical to ndarray.__array_ufunc__, which I believe lets you get "Parameter" like behavior, because Parameter inherits the ufunc from Tensor, which creates Tensors (not Parameters!)

(It is difficult for me to argue about this, because there is not a spec of the desired behavior clearly written down anywhere!)

"I expected to see some sort of fast path here..."

A: It's not a fast path. The simplest implementation has to look at the python type of every input argument as follows:

self (The 1st argument).

Then, the args tuple (The 2nd, ...., nth argument).
I have assumed simple inheritance, therefore I only analyze the ptype of each arg from left to right to find the most derived subclass of either np.NDarray OR torch.Tensor.

"Also, I see you have special-cased Parameter here...."

I will add comments to say torch.nn.Parameter explicitly does not perform ptype propagation. This is because:

Parameters designate inputs to the neural network.

And, All other derived quantities should upcast to torch.Tensor so that they are not treated as tunable parameters in the neural network.
I return the most derived subclass of Tensor that is not a Parameter.

Eg. Parameter + Tensor_Subclass = Tensor_Subclass.

"For example, the Numpy array_ufunc has a special case for the case when the ufunc is identical to ndarray.array_ufunc, which I believe lets you get "Parameter" like behaviour"

This could arguably go two ways:

Parameter + Tensor_Subclass = Tensor_Subclass. This is the desirable behaviour for me, but I need to process the python type of every input arg.

Parameter + Tensor_Subclass = Tensor. This would allow for a fast-track if for-example the first argument was a Parameter, but I can't see this being much faster.

"(It is difficult for me to argue about this, because there is not a spec of the desired behavior clearly written down anywhere!)"

Yes, I'm in conflict the implantation of Parameter. I wish the difference between a Tensor and a Parameter was a boolean flag stored inside the metadata of the tensor, but I'm sure others will disagree.

Let's talk about this more on #22235. I don't mind Parameter too much but subclass propagation rules are a bit funny.

ezyang · 2019-04-10T02:24:10Z

torch/csrc/autograd/python_variable.cpp

+    END_HANDLE_TH_ERRORS
+}
+
+// Implements the default __array_ufunc__ python subclass type propagation. Used in torch._C._TensorBase.


We need some link to the specification of what __array_ufunc__ does, or a recapitulation of its spec here.

"We need some link to the specification of what array_ufunc does, or a recapitulation of its spec here."

__array_ufunc__ traditionally does two things:

Cast all input args and output kwargs to the np.NDarray base class

Look for any subclass overrides of the ufunc that is being called. (eg. did the subclass override np.add?)

Cast all result args to the np.NDarray subclass

When __array_func__ is implemented inside torch.Tensor we should note that torch.Tensor is NOT as subclass of np.NDArray. It should do the following:

Cast all input and output args to the np.NDarray base class

DO NOT Look for any subclass overrides of the ufunc that is being called (It is NOT a subclass of np.NDArray and can not override ufuncs).

Cast all result args to the np.NDarray subclass

My implementation of __array_ufunc__ does not support subclass overrides, of numpy ufuncs because Tensor is not a subclass of np.NDArray.

OK, don't tell me here, put this in the code, if you can :)

ezyang · 2019-04-10T02:33:49Z

torch/csrc/autograd/python_variable.cpp

+        if (PyObject_IsInstance(obj, base)){
+            auto& var = ((THPVariable*)obj)->cdata;
+            result_requires_grad |= var.requires_grad();
+            var.detach_();


An in-place detach as is done here is unlikely to be the desired semantics. If we pass a tensor through numpy we should end up with a Tensor which errors when you try to backward() through it (since the autograd is not defined). No mutations to inputs should occur.

"An in-place detach as is done here is unlikely to be the desired semantics. If we pass a tensor through numpy we should end up with a Tensor which errors when you try to backward() through it (since the autograd is not defined). No mutations to inputs should occur."

I agree. I have fixed the mistake.

ezyang · 2019-04-10T02:34:46Z

torch/csrc/autograd/python_variable.cpp

+    // Up-Cast Results to ptype
+    if (((PyUFuncObject*)ufunc)->nout == 1){
+        if (PyTypeNum_ISBOOL(PyArray_DESCR((PyArrayObject*)result)->type_num)){
+            //Workaround, torch has no built-in bool tensor


We do now! cc @izdeby

ezyang · 2019-04-10T02:36:16Z

torch/csrc/autograd/python_variable.cpp

+            obj = PyTuple_GET_ITEM(results, i);
+        }
+        auto var = torch::utils::tensor_from_numpy(obj);
+        result_requires_grad &= at::is_floating_point(var);


This looks... questionable.

"This looks... questionable."

I totally agree, I have fixed this so that requires_grad=False when calling a numpy __array_ufunc__ because autograd is not possible.
This was a bogus workaround to pass test_multiplication_numpy_scalar in test_torch.py which tests.

This was a bogus workaround to pass test_multiplication_numpy_scalar in test_torch.py

ezyang · 2019-04-10T02:36:37Z

torch/csrc/autograd/python_variable.cpp

+    Py_DECREF(in_args);
+    Py_DECREF(out_args);
+    Py_DECREF(ptype);
+    Py_DECREF(base);


Explicit refcounting is giving me hives. We have C++ helper classes for doing RAII on these, please use them!

I have cleaned up this function so that it uses pybind11 whenever possible.

ezyang · 2019-04-10T02:37:15Z

It seems like there are two orthogonal features in this patchset, which could be merged separately. Is that true?

dylanbespalko · 2019-04-10T02:38:07Z

Thanks @ezyang for looking at my code. I will look though these tomorrow. So sorry about the explicit reference counting. I will look into pybind11.

dylanbespalko · 2019-06-25T22:36:13Z

@ezyang,

"It seems like there are two orthogonal features in this patchset, which could be merged separately. Is that true?"

Yes. I will split:

Pytorch tensor ptype propagation. PR: ptype propagation on torch functions #22235
Numpy tensor ptype propagation via array_ufunc implementation. (Coming soon).

ezyang · 2019-06-26T16:48:36Z

Closing this PR as obsoleted by #22247 and #22235

dylanbespalko added 9 commits March 26, 2019 20:01

Added __array_ufunc__ as THPVariable_array_ufunc to improve compatibi…

2411303

…lity with numpy ufuncs.

Added ptype propagation for tensor subclass support. Still failing fo…

985f487

…r torch.nn tests.

Added workaround so that torch.nn.Parameter types are never propagate…

b008ea6

…d so that torch.nn remains unaffected by type propagation.

Added test_torch unit tests for Tensor subclass python type propagation.

33b0a59

Added test_nn unit tests for Parameter python non-type propagation.

3e3c22b

Merge branch 'master' of https://github.com/pytorch/pytorch

182fde3

Minor fixes to test_torch.

5531bf5

Merge branch 'master' of https://github.com/pytorch/pytorch

c6ce3b5

dylanbespalko mentioned this pull request Mar 29, 2019

Proposal: Add __tensor_wrap__ method similar to numpy __array_wrap__ #17249

Open

Merge branch 'master' of https://github.com/pytorch/pytorch

58ed86e

# Conflicts: # test/test_torch.py

Merge branch 'master' of https://github.com/pytorch/pytorch

952300e

ezyang added module: internals Related to internal abstractions in c10 and ATen module: pybind Related to our Python bindings / interactions with other Python libraries module: performance Issues related to performance, either of kernel code or framework glue labels Apr 9, 2019

ezyang reviewed Apr 10, 2019

View reviewed changes

ezyang self-requested a review April 19, 2019 22:01

ezyang added the open source label Jun 5, 2019

dylanbespalko mentioned this pull request Jun 25, 2019

ptype propagation on torch functions #22235

Closed

dylanbespalko mentioned this pull request Jun 26, 2019

ptype propagation on numpy functions #22247

Closed

ezyang closed this Jun 26, 2019

rgommers mentioned this pull request Jul 1, 2019

PyTorch Tensor subclasses and protocols for NumPy interoperability #22402

Open

Conversation

dylanbespalko commented Mar 29, 2019

Example

Uh oh!

ezyang commented Apr 5, 2019

Uh oh!

dylanbespalko commented Apr 5, 2019 • edited by ezyang Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

apaszke commented Apr 7, 2019

Uh oh!

ezyang commented Apr 8, 2019

Uh oh!

ezyang commented Apr 9, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang commented Apr 10, 2019

Uh oh!

dylanbespalko commented Apr 10, 2019

Uh oh!

dylanbespalko commented Jun 25, 2019

Uh oh!

ezyang commented Jun 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dylanbespalko commented Apr 5, 2019 •

edited by ezyang

Loading