Add functions add/sub/rsub/mul/div that accept a scalar as first arg#29602
Add functions add/sub/rsub/mul/div that accept a scalar as first arg#29602xuhdev wants to merge 42 commits intogh/xuhdev/52/basefrom
Conversation
[ghstack-poisoned]
… first arg" Because #29238 removes special processing of `add/sub/rsub/mul/div`, calls like `torch.add(Scalar, Tensor)` will not be automatically translated to `torch.add(Tensor, Tensor)`. Therefore, we explicitly overload these ops with a scalar as their first arg. [ghstack-poisoned]
… first arg" Because #29238 removes special processing of `add/sub/rsub/mul/div`, calls like `torch.add(Scalar, Tensor)` will not be automatically translated to `torch.add(Tensor, Tensor)`. Therefore, we explicitly overload these ops with a scalar as their first arg. [ghstack-poisoned]
ailzhang
left a comment
There was a problem hiding this comment.
Hmmmm this is a lot of overloads...
I'd take a step back and do the design carefully first.
There are two benefits of having this PR:
- Currently in C++ frontend
torch::add(1, tensor)doesn't compile. This PR will solve that. - This PR combines the next in the stack together will remove the hack in
python_args_parserwhere we convert a scalar directly to a tensor.
My main concern about this PR is: - Number of overloads added and the future maintenance burden with it.
cc: @ezyang @gchanan @yf225 to comment here.
(Sorry I'm not really requesting changes but to make sure we don't proceed without a design.
|
I shouldn't make the call here, but my 2c is that for binary ops, it seems acceptable to me to have |
|
I think this is an important addition for Python/C++ API parity, otherwise there will be friction and ongoing maintenance overhead if the user has to translate Python code |
|
Is there some other approach we can take with this, even if it's not an immediate change in this PR but something to think about for refactoring longer-term? I could imagine having functions be polymorphic in some way where we don't have to define any functions as accepting Scalar (other than in cases where they explicitly do not accept tensors). I'd think any function that can accept a tensor parameter should also be able to accept a scalar, and have it be interpreted as a zero-dim "wrapped number" tensor without extra handling or interfaces. It's odd to me that we can torch.pow(tensor, scalar), torch.pow(scalar,tensor), but not torch.pow(scalar, scalar). If we rely on implementing every combination manually, we'll always have some gaps. I know it's probably uncommon to want to use torch.pow() on two scalars, but consistency helps keep user code generic across inputs. I imagine there are at least a few ways we could solve for this. |
|
Also, I think these new interfaces would need to be handled in shape_analysis.cpp? |
|
@nairbv We might be able to use some templating with some helpers, but I think that might likely also require some other changes in how function declarations are generated... |
|
I mean, we can talk all we want about how exactly the C++ API should be implemented, but at the end of the day, we still need to actually implement kernels for each of the overloads. This isn't evident from this PR because it's still using the old mechanism ( |
| variants: function | ||
| supports_named_tensor: True | ||
|
|
||
| - func: rsub.Scalar2(Scalar other, Tensor self, Scalar alpha=1) -> Tensor |
There was a problem hiding this comment.
do we need rsub? Why would I call this in C++? Is it necessary for the python binding later?
There was a problem hiding this comment.
I don't think we need rsub (including the original ones), but we may need rsub_. I added merely because there are rsubs before. I can attempt to delete the existing rsubs if they are truly unecessary.
aten/src/ATen/native/BinaryOps.cpp
Outdated
| } | ||
|
|
||
| Tensor sub(Scalar other, const Tensor& self, Scalar alpha) { | ||
| return native::sub(wrapped_scalar_tensor(other), self, alpha); |
There was a problem hiding this comment.
this is wrong, same as add.
There was a problem hiding this comment.
I think this is correct? It is different from add.
… first arg" Because #29238 removes special processing of `add/sub/rsub/mul/div`, calls like `torch.add(Scalar, Tensor)` will not be automatically translated to `torch.add(Tensor, Tensor)`. Therefore, we explicitly overload these ops with a scalar as their first arg. [ghstack-poisoned]
… first arg" Because #29238 removes special processing of `add/sub/rsub/mul/div`, calls like `torch.add(Scalar, Tensor)` will not be automatically translated to `torch.add(Tensor, Tensor)`. Therefore, we explicitly overload these ops with a scalar as their first arg. [ghstack-poisoned]
|
Removing myself from reviewer list. If there's something specific you want me to look at let me know. |
… first arg" Because #29238 removes special processing of `add/sub/rsub/mul/div`, calls like `torch.add(Scalar, Tensor)` will not be automatically translated to `torch.add(Tensor, Tensor)`. Therefore, we explicitly overload these ops with a scalar as their first arg. [ghstack-poisoned]
… first arg" Because #29238 removes special processing of `add/sub/rsub/mul/div`, calls like `torch.add(Scalar, Tensor)` will not be automatically translated to `torch.add(Tensor, Tensor)`. Therefore, we explicitly overload these ops with a scalar as their first arg. [ghstack-poisoned]
💊 CircleCI build failures summary and remediationsAs of commit 4a47c6c (more details on the Dr. CI page):
🕵️ 12 new failures recognized by patternsThe following build failures do not appear to be due to upstream breakages:
|
… first arg" Because #29238 removes special processing of `add/sub/rsub/mul/div`, calls like `torch.add(Scalar, Tensor)` will not be automatically translated to `torch.add(Tensor, Tensor)`. Therefore, we explicitly overload these ops with a scalar as their first arg. [ghstack-poisoned]
… first arg" Because #29238 removes special processing of `add/sub/rsub/mul/div`, calls like `torch.add(Scalar, Tensor)` will not be automatically translated to `torch.add(Tensor, Tensor)`. Therefore, we explicitly overload these ops with a scalar as their first arg. [ghstack-poisoned]
… first arg" Because #29238 removes special processing of `add/sub/rsub/mul/div`, calls like `torch.add(Scalar, Tensor)` will not be automatically translated to `torch.add(Tensor, Tensor)`. Therefore, we explicitly overload these ops with a scalar as their first arg. [ghstack-poisoned]
… first arg" Because #29238 removes special processing of `add/sub/rsub/mul/div`, calls like `torch.add(Scalar, Tensor)` will not be automatically translated to `torch.add(Tensor, Tensor)`. Therefore, we explicitly overload these ops with a scalar as their first arg. [ghstack-poisoned]
… first arg" Because #29238 removes special processing of `add/sub/rsub/mul/div`, calls like `torch.add(Scalar, Tensor)` will not be automatically translated to `torch.add(Tensor, Tensor)`. Therefore, we explicitly overload these ops with a scalar as their first arg. [ghstack-poisoned]
… first arg" Because #29238 removes special processing of `add/sub/rsub/mul/div`, calls like `torch.add(Scalar, Tensor)` will not be automatically translated to `torch.add(Tensor, Tensor)`. Therefore, we explicitly overload these ops with a scalar as their first arg. [ghstack-poisoned]
… first arg" Because #29238 removes special processing of `add/sub/rsub/mul/div`, calls like `torch.add(Scalar, Tensor)` will not be automatically translated to `torch.add(Tensor, Tensor)`. Therefore, we explicitly overload these ops with a scalar as their first arg. [ghstack-poisoned]
… first arg" Because #29238 removes special processing of `add/sub/rsub/mul/div`, calls like `torch.add(Scalar, Tensor)` will not be automatically translated to `torch.add(Tensor, Tensor)`. Therefore, we explicitly overload these ops with a scalar as their first arg. [ghstack-poisoned]
… first arg" Because #29238 removes special processing of `add/sub/rsub/mul/div`, calls like `torch.add(Scalar, Tensor)` will not be automatically translated to `torch.add(Tensor, Tensor)`. Therefore, we explicitly overload these ops with a scalar as their first arg. [ghstack-poisoned]
… first arg" Because #29238 removes special processing of `add/sub/rsub/mul/div`, calls like `torch.add(Scalar, Tensor)` will not be automatically translated to `torch.add(Tensor, Tensor)`. Therefore, we explicitly overload these ops with a scalar as their first arg. [ghstack-poisoned]
… first arg" Because #29238 removes special processing of `add/sub/rsub/mul/div`, calls like `torch.add(Scalar, Tensor)` will not be automatically translated to `torch.add(Tensor, Tensor)`. Therefore, we explicitly overload these ops with a scalar as their first arg. [ghstack-poisoned]
… first arg" Because #29238 removes special processing of `add/sub/rsub/mul/div`, calls like `torch.add(Scalar, Tensor)` will not be automatically translated to `torch.add(Tensor, Tensor)`. Therefore, we explicitly overload these ops with a scalar as their first arg. [ghstack-poisoned]
|
@xuhdev are you still working on this or do you need another review round? |
|
I'm still working on this (but slowly). I probably need some help, as it seems like a lot of JIT code is involved, which I'm unfamiliar with... |
|
@ezyang I'm unlikely to work on this PR this week (like the past week), and I'm not sure about the week after this week. Do you have an urgent need of this PR? If so, I'm OK with letting someone else take over it. |
|
Nope, not urgent, just trying to make sure people aren't blocked.' It looks like you are blocked, but I'm not sure if there's someone on our side with time to help you unblock, sorry :( |
|
@ezyang That's awesome! Let me come back to this some time later. |
… first arg" Because #29238 removes special processing of `add/sub/rsub/mul/div`, calls like `torch.add(Scalar, Tensor)` will not be automatically translated to `torch.add(Tensor, Tensor)`. Therefore, we explicitly overload these ops with a scalar as their first arg. [ghstack-poisoned]
|
Hi @xuhdev! Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention. You currently have a record in our system, but we do not have a signature on file. In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks! |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Stack from ghstack:
Because #29238 removes special processing of
add/sub/rsub/mul/div, calls liketorch.add(Scalar, Tensor)will not be automatically translated totorch.add(Tensor, Tensor). Therefore, we explicitly overload these ops with a scalar as their first arg.