-
-
Notifications
You must be signed in to change notification settings - Fork 12k
BUG: Allow np.percentile to operate on float16 data #29105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2f15050 to
74a93b4
Compare
74a93b4 to
6766c8d
Compare
|
Could use a release note. |
I agree. Not sure how to phrase it though, suggestions are welcome. |
|
We hit the issue that this PR fixes in hyperspy/hyperspy#3560. Can we know what is the status of this PR and when could we expect the fix to be included in a release? Thank you! :) |
mhvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eendebakpt - this is not really my cup of tea, but I certainly like that quite a few of the work-arounds could be removed! Still, a larger question about ensuring that all methods will now work with large float16 arrays.
Plus nitpicks about the tests... For the tests, perhaps add some comments about why they are being done. They do feel on current main, I presume?
|
|
||
| def test_percentile_gh_29003_Fraction(self): | ||
| zero = Fraction(0) | ||
| one = Fraction(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here again I assume it was meant to be Fraction(1) -- though perhaps something like nonzero = Fraction(22, 7) would make more sense?
numpy/lib/_function_base_impl.py
Outdated
| # They are mathematically equivalent. | ||
| 'linear': { | ||
| 'get_virtual_index': lambda n, quantiles: (n - 1) * quantiles, | ||
| 'get_virtual_index': lambda n, quantiles: (n - np.int64(1)) * quantiles, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one worries me, as it feels indirect: is this effectively to upcast float16/32 to float64? I can see how that solves the original issue, where what is presumably a float16 would be multiplied with a large n.
But if that is the reason, should something similar not happen for all the other methods too? I think a solution that would cover them all is to ensure that n is an int64 when get_virtual_index is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This indeed looks a bit odd. I do not recall exactly how I ended up here, but the case for method linear is not needed so I removed it.
mhvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that actually is even better than -- removed many and introduced one place where the dtype is explicitly used, nice!
I had a comment on the changelog entry, but really, probably not worth running CI for again! (if you do, add [skip actions] [skip azp] [skip cirrus] to the commit message).
| @@ -0,0 +1 @@ | |||
| * The accuracy of ``np.quantile`` and ``np.percentile`` for 16- and 32-bit floating point input data has been improved. No newline at end of file | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be correct to say "by casting intermediate results to 64 bit"? (If so, I would add that!)
|
Actually, let's just get it in, thanks @eendebakpt! |
|
Thank you @eendebakpt and @mhvk! :) |
* BUG: Allow np.percentile to operate on float16 data * add an extra regression test * add an extra regression test * remove unused default value * add release note * review comments: part1 * review comments: part 2 * review comments: part 3
* BUG: Allow np.percentile to operate on float16 data * add an extra regression test * add an extra regression test * remove unused default value * add release note * review comments: part1 * review comments: part 2 * review comments: part 3
|
Hmmm, CuPy noticed this as a change in result dtype (my understanding). I have to investigate, but I suspect there may be an issue here. The intention is (I guess) to use default precision for the |
Fixes #29003
The approach taken to address the issue is to perform index calculations with increased precision, but keep the behaviour that for
arrof dtypefloat16the outputnp.quantile(arr, q)is of the same type.The behaviour to have the same output type makes sense for float input, but cannot hold for integer input. E.g. we want
np.quantile([1, 2], 0.5, method='linear')to output 1.5. For that reason it might also be ok to just upcase the type to float64.There are also some corner cases. For example input (of either the array or the quantile) of type
Fraction. Possible output types could be eitherFractionorfloat64.