Skip to content

Conversation

@eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented May 31, 2025

Fixes #29003

The approach taken to address the issue is to perform index calculations with increased precision, but keep the behaviour that for arr of dtype float16 the output np.quantile(arr, q) is of the same type.

The behaviour to have the same output type makes sense for float input, but cannot hold for integer input. E.g. we want np.quantile([1, 2], 0.5, method='linear') to output 1.5. For that reason it might also be ok to just upcase the type to float64.

There are also some corner cases. For example input (of either the array or the quantile) of type Fraction. Possible output types could be either Fraction or float64.

@eendebakpt eendebakpt force-pushed the percentile_float16 branch from 2f15050 to 74a93b4 Compare May 31, 2025 22:46
@eendebakpt eendebakpt force-pushed the percentile_float16 branch from 74a93b4 to 6766c8d Compare June 1, 2025 18:06
@eendebakpt eendebakpt changed the title Draft: BUG: Allow np.percentile to operate on float16 data BUG: Allow np.percentile to operate on float16 data Jun 1, 2025
@charris
Copy link
Member

charris commented Jun 3, 2025

Could use a release note.

@eendebakpt
Copy link
Contributor Author

Could use a release note.

I agree. Not sure how to phrase it though, suggestions are welcome.

@ericpre
Copy link

ericpre commented Nov 8, 2025

We hit the issue that this PR fixes in hyperspy/hyperspy#3560. Can we know what is the status of this PR and when could we expect the fix to be included in a release? Thank you! :)

@eendebakpt eendebakpt requested a review from mhvk November 8, 2025 11:38
@charris charris added this to the 2.4.0 release milestone Nov 8, 2025
Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eendebakpt - this is not really my cup of tea, but I certainly like that quite a few of the work-arounds could be removed! Still, a larger question about ensuring that all methods will now work with large float16 arrays.

Plus nitpicks about the tests... For the tests, perhaps add some comments about why they are being done. They do feel on current main, I presume?


def test_percentile_gh_29003_Fraction(self):
zero = Fraction(0)
one = Fraction(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here again I assume it was meant to be Fraction(1) -- though perhaps something like nonzero = Fraction(22, 7) would make more sense?

# They are mathematically equivalent.
'linear': {
'get_virtual_index': lambda n, quantiles: (n - 1) * quantiles,
'get_virtual_index': lambda n, quantiles: (n - np.int64(1)) * quantiles,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one worries me, as it feels indirect: is this effectively to upcast float16/32 to float64? I can see how that solves the original issue, where what is presumably a float16 would be multiplied with a large n.

But if that is the reason, should something similar not happen for all the other methods too? I think a solution that would cover them all is to ensure that n is an int64 when get_virtual_index is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indeed looks a bit odd. I do not recall exactly how I ended up here, but the case for method linear is not needed so I removed it.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that actually is even better than -- removed many and introduced one place where the dtype is explicitly used, nice!

I had a comment on the changelog entry, but really, probably not worth running CI for again! (if you do, add [skip actions] [skip azp] [skip cirrus] to the commit message).

@@ -0,0 +1 @@
* The accuracy of ``np.quantile`` and ``np.percentile`` for 16- and 32-bit floating point input data has been improved. No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be correct to say "by casting intermediate results to 64 bit"? (If so, I would add that!)

@mhvk
Copy link
Contributor

mhvk commented Nov 9, 2025

Actually, let's just get it in, thanks @eendebakpt!

@mhvk mhvk merged commit 884aec9 into numpy:main Nov 9, 2025
77 checks passed
@ericpre
Copy link

ericpre commented Nov 9, 2025

Thank you @eendebakpt and @mhvk! :)

cakedev0 pushed a commit to cakedev0/numpy that referenced this pull request Dec 5, 2025
* BUG: Allow np.percentile to operate on float16 data

* add an extra regression test

* add an extra regression test

* remove unused default value

* add release note

* review comments: part1

* review comments: part 2

* review comments: part 3
IndifferentArea pushed a commit to IndifferentArea/numpy that referenced this pull request Dec 7, 2025
* BUG: Allow np.percentile to operate on float16 data

* add an extra regression test

* add an extra regression test

* remove unused default value

* add release note

* review comments: part1

* review comments: part 2

* review comments: part 3
@seberg
Copy link
Member

seberg commented Jan 6, 2026

Hmmm, CuPy noticed this as a change in result dtype (my understanding). I have to investigate, but I suspect there may be an issue here. The intention is (I guess) to use default precision for the q computation but later use the weak one, but this now disregards the precision of q itself in the final result.
I.e. the old one used effectively used result_type(q.dtype, arr.dtype, 0.0) (via true-divide resolution, though). The code here just uses arr.dtype.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: np.percentile fails with internal overflow when using float16 input on large arrays

5 participants