Implement NumPy-like function torch.fmax() & torch.fmin()#49312
Implement NumPy-like function torch.fmax() & torch.fmin()#49312Kiyosora wants to merge 8 commits intopytorch:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #49312 +/- ##
=======================================
Coverage 80.65% 80.65%
=======================================
Files 1913 1913
Lines 208121 208145 +24
=======================================
+ Hits 167859 167887 +28
+ Misses 40262 40258 -4 |
|
Hey @Kiyosora! Thanks for another PR. We'll review this shortly. |
| amin | ||
| max | ||
| min | ||
| fmax |
There was a problem hiding this comment.
These should be in the binary ops near minimum and maximum
There was a problem hiding this comment.
Addressed, Thanks for correction!
| Computes the element-wise maximum of :attr:`input` and :attr:`other`. | ||
|
|
||
| .. note:: | ||
| If one of the elements being compared is a NaN, then the non-nan element is returned. |
There was a problem hiding this comment.
I think we can remove this note and just write the following:
Computes the element-wise maximum of :attr:`input` and :attr:`other`. This is like :func:`torch.maximum` except it handles NaNs differently: if exactly one of the two elements being compared is a NaN then the non-NaN element is taken as the maximum. Only if both elements are NaN is NaN propagated.
This function is a wrapper around C++'s ``std::fmax`` and is similar to NumPy's ``fmax`` function.
Supports :ref:broadcasting to a common shape <broadcasting-semantics>,
:ref:type promotion <type-promotion-doc>, and integer and floating-point inputs.
heitorschueroff
left a comment
There was a problem hiding this comment.
This PR is looking great. I left a suggestion to simplify the mask logic. This also needs to be made c10 compliant (see comments on native_functions) and add some more tests.
| } | ||
|
|
||
| Tensor& fmax_out(Tensor& result, const Tensor& self, const Tensor& other) { | ||
| auto self_isnan = self.isnan(); |
There was a problem hiding this comment.
Include a fast-path for integer dtypes. Since integer cannot be nan, you can simply return at::gt_out(result, self, other).
There was a problem hiding this comment.
at::maximum_out, not at::gt_out, though
There was a problem hiding this comment.
But see my latest comment below about switching to use TensorIterator for simplicity and performance.
There was a problem hiding this comment.
I've improved the code to use TensorIterator instead.
| auto self_isnan = self.isnan(); | ||
| auto other_isnan = other.isnan(); | ||
| auto both_nan = self_isnan.logical_and(other_isnan); | ||
| return at::maximum(at::where(self_isnan == both_nan, self, other.to(self.dtype())), |
There was a problem hiding this comment.
This can be written more efficiently as: other_isnan || (!self_isnan && self > other)
auto mask = other.isnan().logical_or_(self.isnan().logical_not_().logical_and_(self > other));
return at::where(mask, self, other);
You'd have to do the type promotion beforehand which I think is preferable.
There was a problem hiding this comment.
You could do something similar to the out version, though torch.where does not have an out version so you'd have to copy the result.
There was a problem hiding this comment.
This is an interesting idea, but I think it'll be easier to write this using TensorIterator, which will handle broadcasting and type promotion and be much faster. See my comment below.
There was a problem hiding this comment.
I've improved the code to use TensorIterator instead.
| auto self_isnan = self.isnan(); | ||
| auto other_isnan = other.isnan(); | ||
| auto both_nan = self_isnan.logical_and(other_isnan); | ||
| return at::minimum(at::where(self_isnan == both_nan, self, other.to(self.dtype())), |
There was a problem hiding this comment.
Similarly, this can be written more efficiently as: other_isnan || (!self_isnan && self < other)
auto mask = other.isnan().logical_or_(self.isnan().logical_not_().logical_and_(self < other));
return at::where(mask, self, other);
You'd have to do the type promotion beforehand which I think is preferable.
There was a problem hiding this comment.
In this case I think (self >= other).logical_or(other.isnan()) would work?
However, I would like to recommend this be written like minimum and maximum to use TensorIterator. That will be much more performant and readable. It will require, however, that a gradient be added. The gradient for maximum and minimum should be a good guide. The dispatch will also have to change to no longer be a Math kernel.
The TensorIterator kernels can just be wrappers around std::fmin and std::fmax in C++. I propose we adopt Python's standard for floating-point NaN handling and treat all NaNs as identical, so in this case we don't need to worry about which NaN to return.
There was a problem hiding this comment.
Your equation works if we can assume that comparisons between NaN always return False. This is usually the case but I believe it is not guaranteed in the C++ standard. I agree that writing using TensorIterator will be better and more future proof.
There was a problem hiding this comment.
That's true, we should probably add a note that we assume IEEE 754.
| return at::minimum(self, other); | ||
| } | ||
|
|
||
| Tensor& fmin_out(Tensor& result, const Tensor& self, const Tensor& other) { |
There was a problem hiding this comment.
The result tensor must be the last parameter (see comments on native_functions.yaml)
| return at::maximum(self, other); | ||
| } | ||
|
|
||
| Tensor& fmax_out(Tensor& result, const Tensor& self, const Tensor& other) { |
There was a problem hiding this comment.
The result tensor must be the last parameter (see comments on native_functions.yaml)
| ops = ((torch.maximum, torch.max, np.maximum), (torch.minimum, torch.min, np.minimum), | ||
| (torch.fmax, None, np.fmax), (torch.fmin, None, np.fmin)) | ||
| a_vals = (float('inf'), -float('inf'), float('nan'), float('nan')) | ||
| b_vals = (-float('inf'), float('inf'), float('inf'), float('nan')) |
There was a problem hiding this comment.
We should also test a combination of nan and non-nan values to ensure we cover the cases where the value returned should come from one tensor vs the other.
There was a problem hiding this comment.
I think the current test does check this with a_vals and b_vals?
There was a problem hiding this comment.
I think the only comparison is between float('nan') and float('inf') but what about where the only nan value is in the second tensor?
There was a problem hiding this comment.
Good point. Looks like the test could be more thorough.
There was a problem hiding this comment.
I've add the case of float('inf') and float('nan') to prove the situation of only nan value in the second tensor.
|
Hey @Kiyosora! Thanks for implementing torch.fmax and torch.fmin. This looks good, but there are few changes I'd like to propose:
Let me know your thoughts. Looking forward to adding these functions to PyTorch! |
9f2feb5 to
d041f7f
Compare
There was a problem hiding this comment.
There are a few updates needed here:
- use iter.common_dtype() to test for whether it's a floating type or not and to dispatch
- if iter.common_dtype() is not a floating point type then just call maximum
- in the floating point kernel I don't think you need the NaN checks, I think this can just call std::fmax
Similar changes will need to be made for fmin and the CUDA code.
There was a problem hiding this comment.
Gotcha, I'll fix it right away.
There was a problem hiding this comment.
The extension is good. Also pair a float('nan') with a real number like 0, 1, or .5.
There was a problem hiding this comment.
This is the same formula as for maximum:
That seems odd since these are different functions.
Let's look at the truth table:
- if self and other are numbers, then gradient goes to self if self is > other, sure
- if self and other are nan then this comparison is False and self and other both receive gradient, which is interesting
- if either of self or other are nan then this comparison is false and both receive gradient, that seems odd
In particular, if self is a number and other is NaN then it seems like gradient should go to self but not to other? And, symmetrically, if self is NaN and other is a number then it seems like gradient should go to other and not to self?
I think we should also bias towards self receiving grad, all else being equal, and rewrite these expressions as:
(self >= other).logical_or_(other.isnan()).logical_not_()
Then the masked_fill for other would be:
(self >= other).logical_or_(other.isnan())
@heitorschueroff Does that look correct to you? What do you think, @Kiyosora?
There was a problem hiding this comment.
It's does makes sense for me, Thanks for pointing this out, I will fix it soon.
mruberry
left a comment
There was a problem hiding this comment.
Updates look really good, @Kiyosora!
I think the implementation of the function can be simplified by relying on minimum/maximum, and I have some questions about the gradients, too. Looking forward to hearing your thoughts!
d041f7f to
1863539
Compare
|
Hi @mruberry, Sorry for the delay and Thank you so much for your advice! |
|
Thanks @Kiyosora! @heitorschueroff and I will take a look when we're back at work next week. |
heitorschueroff
left a comment
There was a problem hiding this comment.
This PR is looking great overall except for some minor fixes for which I left comments. The test failures are unrelated.
There was a problem hiding this comment.
This is causing formatting issues, just a little tweak will do it as follows:
Supports :ref:`broadcasting to a common shape <broadcasting-semantics>`,
:ref:`type promotion <type-promotion-doc>`, and integer and floating-point inputs.
There was a problem hiding this comment.
See comment above for fmax on the formatting.
There was a problem hiding this comment.
I believe the linter will complain that the return statement is on the same line. Could you move it to its own line.
There was a problem hiding this comment.
These kernels are looking good, just the same changes as for the fmax kernel.
There was a problem hiding this comment.
iter.dtype -> iter.common_dtype
There was a problem hiding this comment.
double -> scalar_t ? or do we need double here?
There was a problem hiding this comment.
It should be scalar_t here, I just made a misunderstand, thanks for correcting!
There was a problem hiding this comment.
Same changes as for the fmax.
80d56a2 to
ef015b6
Compare
|
Thank you so much for being such patient in code reviewing, @heitorschueroff. 👍 |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@heitorschueroff has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@Kiyosora There were some internal issues preventing this PR from landing and looks like it picked up a merge conflict now. Do you mind rebasing and fixing the conflict? |
ef015b6 to
87927d9
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
@heitorschueroff has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@heitorschueroff merged this pull request in 4803eaf. |
) Summary: - Implementing the NumPy-like function`torch.fmax()` and `torch.fmin()` recommended in pytorch#48440 Pull Request resolved: pytorch#49312 Reviewed By: izdeby Differential Revision: D25887246 Pulled By: heitorschueroff fbshipit-source-id: d762eeff8b328bfcbe7d48b7ee9d2da72c249691
torch.fmax()andtorch.fmin()recommended in Implement NumPy-like function torch.msort() #48440