Implement copysign#46396
Conversation
[ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 310bc5e (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 2 times. |
💊 CI failures summary and remediationsAs of commit e3e5eed (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 2 failures confirmed as flaky and can be ignored:
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 216 times. |
[numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) - No in-place function - No method - Optional output - Available: bool, int, short, long, float, double, half - Not available: byte, char, float/double complex TODO: - [ ] test - [ ] doc - [ ] kernel_vec [ghstack-poisoned]
| } | ||
|
|
||
| Tensor copysign_tensor_backward(Tensor grad, Tensor self, Tensor other) { | ||
| auto result = grad * self.sign() * other.sign(); |
There was a problem hiding this comment.
I am not sure that you need the self.sign() here.
It will fail gradcheck when you add it to the list in common_method_invocation.py if the formula is wrong. But I think you will need to change this.
There was a problem hiding this comment.
For instance:
a = tensor(-1.)b = tensor(1.)c = torch.copysign(a, b) = tensor(1)
The derivative of a is -1 rather than b.sign() = 1. Any thought on that?
There was a problem hiding this comment.
Ho right, the derivative is -1 when we change and 1 otherwise so you need both! Agree with you.
Also there is a corner case at 0 here where sign() returns 0. What is copysign() doing for that? Is the backward formula good for this case as well?
|
From reading your table of outputs, is the following correct: If it is the case, then it makes the gradient computation easier to derive for special points. So basically all the "?" in your table above should be 0 as they all correspond to this case where a~=0. |
Related #38349 [numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) - No in-place function - No method - Optional output - Available: bool, int, short, long, float, double, half - Integral promoted to float - Not available: byte, char, float/double complex `c = np.copysign(a, b)` | a | b | c | a.grad | |:--:|:--:|:--:|:----:| | -1 | -1 | -1 | 1 | | -0 | -1 | -0 | 1? | | 0 | -1 | -0 | -1? | | 1 | -1 | -1 | -1 | | -1 | -0 | 1 | -1 | | -0 | -0 | 0 | -1? | | 0 | -0 | 0 | 1? | | 1 | -0 | 1 | 1 | | -1 | 0 | 1 | -1 | | -0 | 0 | 0 | -1? | | 0 | 0 | 0 | 1? | | 1 | 0 | 1 | 1 | | -1 | 1 | 1 | -1 | | -0 | 1 | 0 | -1? | | 0 | 1 | 0 | 1? | | 1 | 1 | 1 | 1 | This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0. TODO: - [ ] test - [ ] doc - [x] ~kernel_vec~ [ghstack-poisoned]
Related #38349 [numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) - No in-place function - No method - Optional output - Available: bool, int, short, long, float, double, half - Integral promoted to float - Not available: byte, char, float/double complex `c = np.copysign(a, b)` | a | b | c | a.grad | |:--:|:--:|:--:|:----:| | -1 | -1 | -1 | 1 | | -0 | -1 | -0 | 1? | | 0 | -1 | -0 | -1? | | 1 | -1 | -1 | -1 | | -1 | -0 | 1 | -1 | | -0 | -0 | 0 | -1? | | 0 | -0 | 0 | 1? | | 1 | -0 | 1 | 1 | | -1 | 0 | 1 | -1 | | -0 | 0 | 0 | -1? | | 0 | 0 | 0 | 1? | | 1 | 0 | 1 | 1 | | -1 | 1 | 1 | -1 | | -0 | 1 | 0 | -1? | | 0 | 1 | 0 | 1? | | 1 | 1 | 1 | 1 | This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0. TODO: - [ ] test - [ ] doc - [x] ~kernel_vec~ [ghstack-poisoned]
Related #38349 [numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) - No in-place function - No method - Optional output - Available: bool, int, short, long, float, double, half - Integral promoted to float - Not available: byte, char, float/double complex `c = np.copysign(a, b)` | a | b | c | a.grad | |:--:|:--:|:--:|:----:| | -1 | -1 | -1 | 1 | | -0 | -1 | -0 | 1? | | 0 | -1 | -0 | -1? | | 1 | -1 | -1 | -1 | | -1 | -0 | 1 | -1 | | -0 | -0 | 0 | -1? | | 0 | -0 | 0 | 1? | | 1 | -0 | 1 | 1 | | -1 | 0 | 1 | -1 | | -0 | 0 | 0 | -1? | | 0 | 0 | 0 | 1? | | 1 | 0 | 1 | 1 | | -1 | 1 | 1 | -1 | | -0 | 1 | 0 | -1? | | 0 | 1 | 0 | 1? | | 1 | 1 | 1 | 1 | This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0. TODO: - [ ] test - [ ] doc - [x] ~kernel_vec~ [ghstack-poisoned]
Related #38349 [numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) - No in-place function - No method - Optional output - Available: bool, int, short, long, float, double, half - Integral promoted to float - Not available: byte, char, float/double complex `c = np.copysign(a, b)` | a | b | c | a.grad | |:--:|:--:|:--:|:----:| | -1 | -1 | -1 | 1 | | -0 | -1 | -0 | 0 | | 0 | -1 | -0 | 0 | | 1 | -1 | -1 | -1 | | -1 | -0 | 1 | -1 | | -0 | -0 | 0 | 0 | | 0 | -0 | 0 | 0 | | 1 | -0 | 1 | 1 | | -1 | 0 | 1 | -1 | | -0 | 0 | 0 | 0 | | 0 | 0 | 0 | 0 | | 1 | 0 | 1 | 1 | | -1 | 1 | 1 | -1 | | -0 | 1 | 0 | 0 | | 0 | 1 | 0 | 0 | | 1 | 1 | 1 | 1 | This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0. TODO: - [x] test - [ ] doc - [x] ~kernel_vec~ [ghstack-poisoned]
Related #38349 [numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) - No in-place function - No method - Optional output - Available: byte, char, bool, int, short, long, float, double, half - Integral promoted to float - Not available: float/double complex `c = np.copysign(a, b)` | a | b | c | a.grad | |:--:|:--:|:--:|:----:| | -1 | -1 | -1 | 1 | | -0 | -1 | -0 | 0 | | 0 | -1 | -0 | 0 | | 1 | -1 | -1 | -1 | | -1 | -0 | 1 | -1 | | -0 | -0 | 0 | 0 | | 0 | -0 | 0 | 0 | | 1 | -0 | 1 | 1 | | -1 | 0 | 1 | -1 | | -0 | 0 | 0 | 0 | | 0 | 0 | 0 | 0 | | 1 | 0 | 1 | 1 | | -1 | 1 | 1 | -1 | | -0 | 1 | 0 | 0 | | 0 | 1 | 0 | 0 | | 1 | 1 | 1 | 1 | This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0. TODO: - [x] test - [x] doc - [x] ~kernel_vec~ [ghstack-poisoned]
Related #38349 [numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) - No in-place function - No method - Optional output - Available: byte, char, bool, int, short, long, float, double, half - Integral promoted to float - Not available: float/double complex `c = np.copysign(a, b)` | a | b | c | a.grad | |:--:|:--:|:--:|:----:| | -1 | -1 | -1 | 1 | | -0 | -1 | -0 | 0 | | 0 | -1 | -0 | 0 | | 1 | -1 | -1 | -1 | | -1 | -0 | 1 | -1 | | -0 | -0 | 0 | 0 | | 0 | -0 | 0 | 0 | | 1 | -0 | 1 | 1 | | -1 | 0 | 1 | -1 | | -0 | 0 | 0 | 0 | | 0 | 0 | 0 | 0 | | 1 | 0 | 1 | 1 | | -1 | 1 | 1 | -1 | | -0 | 1 | 0 | 0 | | 0 | 1 | 0 | 0 | | 1 | 1 | 1 | 1 | This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0. TODO: - [x] test - [x] doc - [x] ~kernel_vec~ [ghstack-poisoned]
Related #38349 [numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) - No in-place function - No method - Optional output - Available: byte, char, bool, int, short, long, float, double, half - Integral promoted to float - Not available: float/double complex `c = np.copysign(a, b)` | a | b | c | a.grad | |:--:|:--:|:--:|:----:| | -1 | -1 | -1 | 1 | | -0 | -1 | -0 | 0 | | 0 | -1 | -0 | 0 | | 1 | -1 | -1 | -1 | | -1 | -0 | 1 | -1 | | -0 | -0 | 0 | 0 | | 0 | -0 | 0 | 0 | | 1 | -0 | 1 | 1 | | -1 | 0 | 1 | -1 | | -0 | 0 | 0 | 0 | | 0 | 0 | 0 | 0 | | 1 | 0 | 1 | 1 | | -1 | 1 | 1 | -1 | | -0 | 1 | 0 | 0 | | 0 | 1 | 0 | 0 | | 1 | 1 | 1 | 1 | This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0. TODO: - [x] test - [x] doc - [x] ~kernel_vec~ - [ ] torch.copysign(Number input, Tensor other) [ghstack-poisoned]
Related #38349 [numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) - No in-place function - No method - Optional output - Available: byte, char, bool, int, short, long, float, double, half - Integral promoted to float - Not available: float/double complex `c = np.copysign(a, b)` | a | b | c | a.grad | |:--:|:--:|:--:|:----:| | -1 | -1 | -1 | 1 | | -0 | -1 | -0 | 0 | | 0 | -1 | -0 | 0 | | 1 | -1 | -1 | -1 | | -1 | -0 | 1 | -1 | | -0 | -0 | 0 | 0 | | 0 | -0 | 0 | 0 | | 1 | -0 | 1 | 1 | | -1 | 0 | 1 | -1 | | -0 | 0 | 0 | 0 | | 0 | 0 | 0 | 0 | | 1 | 0 | 1 | 1 | | -1 | 1 | 1 | -1 | | -0 | 1 | 0 | 0 | | 0 | 1 | 0 | 0 | | 1 | 1 | 1 | 1 | This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0. TODO: - [x] test - [x] doc - [x] ~kernel_vec~ - [ ] torch.copysign(Number input, Tensor other) [ghstack-poisoned]
Related #38349 [numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) - No in-place function - No method - Optional output - Available: byte, char, bool, int, short, long, float, double, half - Integral promoted to float - Not available: float/double complex `c = np.copysign(a, b)` | a | b | c | a.grad | |:--:|:--:|:--:|:----:| | -1 | -1 | -1 | 1 | | -0 | -1 | -0 | 0 | | 0 | -1 | -0 | 0 | | 1 | -1 | -1 | -1 | | -1 | -0 | 1 | -1 | | -0 | -0 | 0 | 0 | | 0 | -0 | 0 | 0 | | 1 | -0 | 1 | 1 | | -1 | 0 | 1 | -1 | | -0 | 0 | 0 | 0 | | 0 | 0 | 0 | 0 | | 1 | 0 | 1 | 1 | | -1 | 1 | 1 | -1 | | -0 | 1 | 0 | 0 | | 0 | 1 | 0 | 0 | | 1 | 1 | 1 | 1 | This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0. TODO: - [x] test - [x] doc - [x] ~kernel_vec~ - [ ] torch.copysign(Number input, Tensor other) [ghstack-poisoned]
Sounds great.
Just test if the second dtype is a float type or not and only perform that part of the test if it is. |
[numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) - No in-place function - No method - Optional output - Available: byte, char, bool, int, short, long, float, double, half - Integral promoted to float - Not available: float/double complex `c = np.copysign(a, b)` | a | b | c | a.grad | |:--:|:--:|:--:|:----:| | -1 | -1 | -1 | 1 | | -0 | -1 | -0 | 0 | | 0 | -1 | -0 | 0 | | 1 | -1 | -1 | -1 | | -1 | -0 | -1 | 1 | | -0 | -0 | -0 | 0 | | 0 | -0 | -0 | 0 | | 1 | -0 | -1 | -1 | | -1 | 0 | 1 | -1 | | -0 | 0 | 0 | 0 | | 0 | 0 | 0 | 0 | | 1 | 0 | 1 | 1 | | -1 | 1 | 1 | -1 | | -0 | 1 | 0 | 0 | | 0 | 1 | 0 | 0 | | 1 | 1 | 1 | 1 | This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0. TODO: - [x] test (cpu/gpu) - [x] doc - [x] ~kernel_vec~ Differential Revision: [D24401366](https://our.internmc.facebook.com/intern/diff/D24401366) [ghstack-poisoned]
[numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) - No in-place function - No method - Optional output - Available: byte, char, bool, int, short, long, float, double, half - Integral promoted to float - Not available: float/double complex `c = np.copysign(a, b)` | a | b | c | a.grad | |:--:|:--:|:--:|:----:| | -1 | -1 | -1 | 1 | | -0 | -1 | -0 | 0 | | 0 | -1 | -0 | 0 | | 1 | -1 | -1 | -1 | | -1 | -0 | -1 | 1 | | -0 | -0 | -0 | 0 | | 0 | -0 | -0 | 0 | | 1 | -0 | -1 | -1 | | -1 | 0 | 1 | -1 | | -0 | 0 | 0 | 0 | | 0 | 0 | 0 | 0 | | 1 | 0 | 1 | 1 | | -1 | 1 | 1 | -1 | | -0 | 1 | 0 | 0 | | 0 | 1 | 0 | 0 | | 1 | 1 | 1 | 1 | This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0. TODO: - [x] test (cpu/gpu) - [x] doc - [x] ~kernel_vec~ Differential Revision: [D24401366](https://our.internmc.facebook.com/intern/diff/D24401366) [ghstack-poisoned]
[numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) - No in-place function - No method - Optional output - Available: byte, char, bool, int, short, long, float, double, half - Integral promoted to float - Not available: float/double complex `c = np.copysign(a, b)` | a | b | c | a.grad | |:--:|:--:|:--:|:----:| | -1 | -1 | -1 | 1 | | -0 | -1 | -0 | 0 | | 0 | -1 | -0 | 0 | | 1 | -1 | -1 | -1 | | -1 | -0 | -1 | 1 | | -0 | -0 | -0 | 0 | | 0 | -0 | -0 | 0 | | 1 | -0 | -1 | -1 | | -1 | 0 | 1 | -1 | | -0 | 0 | 0 | 0 | | 0 | 0 | 0 | 0 | | 1 | 0 | 1 | 1 | | -1 | 1 | 1 | -1 | | -0 | 1 | 0 | 0 | | 0 | 1 | 0 | 0 | | 1 | 1 | 1 | 1 | This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0. TODO: - [x] test (cpu/gpu) - [x] doc - [x] ~kernel_vec~ Differential Revision: [D24401366](https://our.internmc.facebook.com/intern/diff/D24401366) [ghstack-poisoned]
|
Update because of the following two reasons:
|
Summary: Pull Request resolved: pytorch#46396 Related pytorch#38349 [numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) - No in-place function - No method - Optional output - Available: byte, char, bool, int, short, long, float, double, half - Integral promoted to float - Not available: float/double complex `c = np.copysign(a, b)` | a | b | c | a.grad | | -1 | -1 | -1 | 1 | | -0 | -1 | -0 | 0 | | 0 | -1 | -0 | 0 | | 1 | -1 | -1 | -1 | | -1 | -0 | -1 | 1 | | -0 | -0 | 0 | 0 | | 0 | -0 | 0 | 0 | | 1 | -0 | -1 | -1 | | -1 | 0 | 1 | -1 | | -0 | 0 | 0 | 0 | | 0 | 0 | 0 | 0 | | 1 | 0 | 1 | 1 | | -1 | 1 | 1 | -1 | | -0 | 1 | 0 | 0 | | 0 | 1 | 0 | 0 | | 1 | 1 | 1 | 1 | This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0. TODO: - [x] test (cpu/gpu) - [x] doc - [x] ~kernel_vec~ Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D24401366 Pulled By: ejguan fbshipit-source-id: 3621c5ff74b185376a3705589983bb5197ab896d
Related #38349
Stack from ghstack:
numpy
c = np.copysign(a, b)This function becomes non-differentiable at
a=0for anyb. So, in my opinion, we may set the gradient fora=0to 0.TODO:
kernel_vecDifferential Revision: D24401366