ptype propagation on torch functions#22235
ptype propagation on torch functions#22235dylanbespalko wants to merge 1 commit intopytorch:masterfrom dylanbespalko:tensor_subclass
Conversation
…rch functions on Tensors (Parameters are unaffected).
|
The performance difference was profiled on a Nvidia Jetson TX2 (may be slower than most). The results of the Fast RNN benchmarks before/after the changes are shown below: Namespace(cnns=None, device='cuda', group=['cnns', 'rnns'], hiddenSize=512, inputSize=512, miniBatch=64, nloops=100, numLayers=1, print_json=None, rnns=None, sep=' ', seqLength=100, variable_lstms=False, warmup=10) Benchmarking LSTMs... Benchmarking ResNets... There doesn't seem to be any degradation in performance as a result of these changes. |
| from torch.autograd._functions import Resize | ||
| return Resize.apply(self, sizes) | ||
| new_tensor = Resize.apply(self, sizes) | ||
| return torch.Tensor._make_subclass(self.__class__, new_tensor.data, new_tensor.requires_grad) |
There was a problem hiding this comment.
I'm not sure if it actually matters here (since these are deprecated methods on tensor), but the loss of autograd metadata here makes me raise my eyebrows. I haven't reviewed how _make_subclass works yet, but I don't see how you could possibly preserve ResizeBackwards in this case, because you have only passed along the detached tensor and whether or not it requires grad. But that's not all the information that a Variable may have (e.g., grad_fn!)
There was a problem hiding this comment.
Yeah, I think I detached by mistake again. I will fix this.
| def test_ptype_non_propagation(self): | ||
| p = Parameter(torch.rand((3,), dtype=torch.double, requires_grad=False)) | ||
| self.assertIs((p + 1).__class__, torch.Tensor) | ||
| self.assertIsNot((p + 1).__class__, Parameter) |
| pass | ||
|
|
||
|
|
||
| def _rebuild_subclass(type_, data, requires_grad, backward_hooks): |
There was a problem hiding this comment.
I guess this code was cargo culted from somewhere? Might be worth saying where it came from.
There was a problem hiding this comment.
This was in torch/nn/parameter.py which was the original subclass of Tensor. The comment says it is redundant and it is. I can remove this is you want.
There was a problem hiding this comment.
I see the definition has now been moved to torch._utils.py
def _rebuild_parameter(data, requires_grad, backward_hooks):
param = torch.nn.Parameter(data, requires_grad)
# NB: This line exists only for backwards compatibility; the
# general expectation is that backward_hooks is an empty
# OrderedDict. See Note [Don't serialize hooks]
param._backward_hooks = backward_hooks
return param| HANDLE_TH_ERRORS | ||
| static PythonArgParser parser({ | ||
| "_make_subclass(PyObject* cls, Tensor data, bool require_grad=False)", | ||
| "_make_subclass(PyObject* cls, Tensor data, bool require_grad=False, bool allow_tensor_metadata_change=True)", |
There was a problem hiding this comment.
We need some documentation saying what I see it is a preexisting parameter.allow_tensor_metadata_change does
There was a problem hiding this comment.
It was a pre-existing parameter and it would detach the tensor because allow_tensor_metadata_change was set to False. Every tensor would detach when I called make_subclass().
|
@yf225 could you please review the use of |
|
@dylanbespalko could you please give a snippet of the generated code before and after this diff? EDIT: I guess it's the same as in #18610 (comment) ? |
apaszke
left a comment
There was a problem hiding this comment.
This is some good stuff! Not a complete review, and I'd like to take a closer look before this is merged, but may I please request a benchmark which does not use a GPU-bound model? Do something like this:
def slow(x, y):
for i in range(1000):
x = x + y
return xand run this with x and y containing 1-10 elements. That will highlight the real overhead of this patch.
I'm also somewhat suspicious of the fact that the numbers you report seem to indicate that this patch makes everything faster? That seems veeeery unlikely to me, unless you have a good explanation for that.
| PyObject *obj; | ||
|
|
||
| //Initialize on First Call | ||
| bool initialized = false; |
There was a problem hiding this comment.
Also, what about thread safety? Should we guard this with an std::call_once?
There was a problem hiding this comment.
This doesn't need to be thread safe because it is in Python code and therefore is GIL protected
There was a problem hiding this comment.
But yes, it definitely needs to be static lol
There was a problem hiding this comment.
Oops. I'll fix it.
| ptype = (PyTypeObject*)base; | ||
| Py_INCREF(ptype); | ||
| // torch.nn.Parameter explicitly do not perform ptype propagation so this should not effect torch.nn modules. | ||
| if (self != nullptr && PyObject_IsInstance(self, base) && !PyObject_IsInstance(self, parameter) && PyObject_IsInstance(self, (PyObject*)ptype)){ |
There was a problem hiding this comment.
Can we prefer explicit comparisons on o->ob_type instead of IsInstance calls? Those are very slow, and this executed in the hot path.
| if data is None: | ||
| data = torch.Tensor() | ||
| self = torch.Tensor._make_subclass(cls, data, requires_grad) | ||
| return self |
There was a problem hiding this comment.
Do you have an explanation for how, in general, you should go about defining subclasses of tensor? I see here that you had to explicitly define __new__; this is a bit magical, so it would be really helpful to know exactly why this is needed, and how the general pattern works.
There was a problem hiding this comment.
TensorSubclass in test_torch.py is identical to Parameter in torch.nn.parameter.py. I know you have to define __deepcopy__, __reduce_ex__. I think __reduce_ex__ is doing something for legacy purposes, but I have included it anyways. These are tested in the unit tests.
"you had to explicitly define new; this is a bit magical"
__new__(cls, ...) takes cls (a class object) as the first argument and defines the constructor of a class. You pass the PyTypeObject* cls to THPVariable_make_subclass() which defines the Python (ptype) of the class.
Example:
my_class = MyTensorSubClass # A Class (or type)__init__(self, ) takes self ( an instance of the class) as the first argument and defines a constructor of an object.
Example:
my_obj = MyTensorSubClass() # An instance of the Class.The terminology is really weird here because in Python an object, class, function, method, ..., etc are all objects.
| pass | ||
|
|
||
| class C(A, B): | ||
| pass |
There was a problem hiding this comment.
It's really interesting that you added support for complicated class hierarchies, but it's not clear to me what good use-cases for this are. Do you have some good descriptions?
One reason I ask about this, is that I am not certain that the correct thing to do is to unconditionally wrap in the most derived subclass. For example, if I think about someone implementing "named" tensors as a tensor subclass, and then someone else separately implementing "debug" tensors as a tensor subclass, returning either a named tensor or a debug tensor as a result of combining the two is unlikely to be a useful operation. In fact, there isn't really any good thing we can do here, we should probably just error.
EDIT: I forgot that you do have examples in the bug report you sent us: #17249
EDIT 2: But actually it doesn't look like there's any nontrivial class hierarchies in your bug report.
There was a problem hiding this comment.
I don't need class hierarchies personally. I think I took this directly from how numpy tests overloading numpy methods using __array_ufunc__. I have no preference, but I know this way matches numpy.
There was a problem hiding this comment.
Example:
For class C(A, B), numpy tests that result = a + c will call the definition of C.add() which inherits from A.add() and ignores B.add().
There was a problem hiding this comment.
I just looked at how I use this. Since there are both Tensors and Parameters, I actually use multiple inheritance and hierarchal inheritance. Here is an example of something that I called TFTensor:
class BaseTensor(th.Tensor): # Used to add new methods to all my tensors at the python level.
class TFTensor(base.BaseTensor): # A Tensor where time and frequency are the last two dimensions.
def __repr__(self):
return '%s(..., %s, %s)\n%s' % (TFTensor.__name__, "t", "f", super(TFTensor, self).__repr__())
# Several other methods are added here
class TFParameter(TFTensor, Parameter): # A TFTensor that is used in a neural network.
passNote: BaseTensor is just a convenience that I use for adding numpy compatible methods when porting code from numpy to pytorch. I might get rid of it soon.
|
EDIT: I have apparently forgotten all the details of the original proposal. Rereading it, I'll come back here. EDIT 2: Let's discuss this some more at #17249 |
|
Hi @dylanbespalko; thank you for all the work you've done on this PR. I don't want to string you along, so let me be explicit: we don't want to accept the version of this patch that only does ptype wrapping. I understand that this is helpful for your use case, but the complexity versus benefit doesn't make sense for PyTorch. What does make sense for PyTorch is a full If you are interested in implementing |
|
Hi @apaszke, I apologize for my bogus GPU benchmark. I highly doubt that the performance drawbacks of this PR would show up in that situation. I have run your benchmark function taking the best time out of 10000: @time_it
def slow(x, y):
for i in range(1000):
x = x + y
return xThe result indicate there is 1.6% slowdown with isinstance(), and a 0.16% slowdown with self->ob_type checks. I also checked that for tensors with size above 1000 there is no performance degradation at all. Thanks for spotting this, I will make the necessary changes. |
|
Thanks for your honesty on the likelyhood of this getting approved. Ultimately, passing the decision making to either Please take my recommendations from #17249 when planning your future work. I don't need the credit, just get the result. Please feel free to keep me in the loop, but I have to focus on another problem for a bit. |
|
Excerpts from dylanbespalko's message of 2019-06-27 22:09:17 -0700:
Please take my recommendations from [#17249 ](#17249 (comment)) when planning your future work. I don't need the credit, just get the result. Please feel free to keep me in the loop, but I have to focus on another problem for a bit.
Absolutely, we will definitely keep it in mind!
|
@dylanbespalko I've tried to understand this comment and went through all your other PRs, but I'm unclear on what in your definition the difference is between |
|
My original implementation was done in python where I implemented two functions: I was not able to repeat the same solution in C++ (it may require some C++ code design changes), so I just implemented the ptype propagation in this PR and the Even if I could create |
|
Thanks @dylanbespalko, I understand what you meant now.
That really is unavoidable, this is simply how Python operators work. It's the logic of how It's best to explicitly cast to either |
|
Hello, I've been following this conversation (and other related ones), and, although I don't understand all of the complications, I'm very interested in this merge request for some work I'm doing to subclass torch.Tensor and implement custom operations for the derived class. Will it be possible to resolve conflicts and merge soon? Thanks! |
|
@bdsmith147 I believe @ezyang's message: #22235 (comment) is canonical here. |
|
I am currently using my own patch for tensor subclassing, however it uses a My plan is to patch my own code until the |
|
^-- @zou3519 you've got a user! :) |
|
Hi @dylanbespalko! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks! |
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |

Resolves issue #17249
Added support for python subclass (ptype) propagation when calling torch functions on Tensors (Parameters are unaffected).
I have addressed and responded to the comments on the previous PR. #18610. Please visit this link to see that issues that were resolved. As suggested the changes have been broken in to two PR:
I have also fixed a bug and added tests for ptype propagation during tensor indexing operations.
A simple python script demonstrating ptype propagation is as follows: