BUG: quantile should error when weights are all zeros #28595

Tontonio3 · 2025-03-27T18:26:41Z

Fixes issue #28589

np.quantile now raises errors if:

All weights are zero
At least one weight is np.nan
At least one weight is np.inf

tylerjereddy

Looks like a bunch of folks tried to fix this at the same time. This PR deals with more cases than the others, but handling NaN/Inf is a bit messier maybe. The other two PRs recently opened are:

I provided initial reviews to each since they are from first-time contributors, but we should probably consolidate one way or another.

tylerjereddy · 2025-03-28T02:45:58Z

numpy/lib/_function_base_impl.py

+            raise ValueError("At least one weight must be non-zero")
+        if weights.dtype != object:
+            if np.any(np.isinf(weights)):
+                raise ValueError("Weights must be non-infinite")


nit: non-infinite-> finite probably

tylerjereddy · 2025-03-28T02:53:54Z

numpy/lib/tests/test_function_base.py

+            with pytest.raises(ValueError) as ex:
+                a = np.quantile(arr, q, weights=wgt, method=m)
+            assert "Weights must be non-infinite" in str(ex)
+            wgt[i] = 1


Probably sufficient to just check a single index rather than all of them. For the message check, I think we usually prefer using the match builtin to pytest.raises these days.

tylerjereddy · 2025-03-28T02:55:11Z

numpy/lib/tests/test_function_base.py

+            wgt[i] = np.inf
+            with pytest.raises(ValueError) as ex:
+                a = np.quantile(arr, q, weights=wgt, method=m)
+            assert "Weights must be non-infinite" in str(ex)


could parametrize the test to check one weights array with a few NaNs and another with just 1, but probably don't need these two exhaustive loops

tylerjereddy · 2025-03-28T02:56:10Z

numpy/lib/tests/test_function_base.py

+            wgt[i] = np.nan
+            with pytest.raises(ValueError) as ex:
+                a = np.quantile(arr, q, weights=wgt, method=m)
+            assert "At least one weight is nan" in str(ex)


Could also probably just parametrize over np.nan and np.inf + expected message above for concision (and same comment about not needing the exhaustive loops

Thanks, will do

Tontonio3 · 2025-03-28T19:02:23Z

but handling NaN/Inf is a bit messier maybe

I'll try to make it better, but with np.quantile accepting dtype=object np.arrays makes it difficult because np.isinf and np.isnan do not work with dytype=object arrays. Which makes it difficult to handle it elegantly, I should probably open a pull request about this tbh.

Also, when I started working on the issue the other requests weren't open yet lol

Tontonio3 · 2025-03-31T17:15:37Z

Is there anything I need to do to fix the tests? I'm kinda confused as to why they are failing

mhvk

Some comments inline, but also a more philosophical one here. Python usually avoids "Look Before You Leap" (LBYL), in favour of "Easier to Ask Forgiveness than Permission" (EAFP), and I worry a bit when seeing many checks for possible failure modes, which make the code more complex, and also makes things slower for the common case where the user puts in sensible numbers.

For quantile, there is already a check against negative weights, but those are quite important, because they would not lead to a failure, but still give a wrong result.

The case is a little different for inf and all-zero - those give RuntimeWarning, so users could know something is amiss. But nan gives no warning (indeed, nan are allowed in the values...).

Anyway, maybe overall protecting the user is worth it, but we should at least try to make the performance penalty as small as possible, hence my suggestions inline.

But looking closer, there may be another option that would be faster: the weights get transferred down via various functions to _quantile. There, one sees:

numpy/numpy/lib/_function_base_impl.py

Lines 4892 to 4895 in a9a9bf8

    
           # We use the weights to calculate the empirical cumulative 
        
           # distribution function cdf 
        
           cdf = weights.cumsum(axis=0, dtype=np.float64) 
        
           cdf /= cdf[-1, ...]  # normalization to 1

So, at this point we have two benefits: any object array are already turned into float, and we only have to check the last element to see whether there are any problems, rather than scan the whole array. So, one could add in between these two lines,

if not np.all(np.isfinite(cdf[-1, ...]):
    raise ValueError(...)

For the check for all-zero, one can include it a little below, inside another branch:

        if np.any(cdf[0, ...] == 0):
            # might as well guard against all-zero here.
            if np.any(cdf[-1, ...] == 0):
                raise ValueError(...)

            cdf[cdf == 0] = -1

The above should be better for performance, though I can see the argument for just having nicely validated data to start with.

mhvk · 2025-03-31T19:03:08Z

numpy/lib/_function_base_impl.py

        if axis is not None:
            axis = _nx.normalize_axis_tuple(axis, a.ndim, argname="axis")
        weights = _weights_are_valid(weights=weights, a=a, axis=axis)
+        if weights.dtype != object:


A general comment: I think these checks should happen inside _weights_ar_valid - this will ensure they are used for percentile as well.

mhvk · 2025-03-31T19:03:54Z

numpy/lib/tests/test_function_base.py


-
+
+    @pytest.mark.parametrize(["err_msg", "weight"], 


I'd parametrize over np.quantile and np.percentile as well - they should have the same errors.

Also, if you pass in a list rather than an array, you could parametrize over dtype=float and dtype=object, to make this a little more readable.

mhvk · 2025-03-31T19:14:58Z

numpy/lib/_function_base_impl.py

            axis = _nx.normalize_axis_tuple(axis, a.ndim, argname="axis")
        weights = _weights_are_valid(weights=weights, a=a, axis=axis)
+        if weights.dtype != object:
+            if np.any(np.isinf(weights)):


Another general comment: as written, the common case has to go through a lot of checks. I think it would be better to optimize for the common case, and not worry too much about distinguishing failure cases. E.g., you can do just one evaluation with:

if not np.all(np.isfinite(weights)): raise ValueError("weights must be finite.")

mhvk · 2025-03-31T19:17:54Z

numpy/lib/_function_base_impl.py

+        # Since np.isinf and np.isnan do not work in dtype object arrays
+        # Also, dtpye object arrays with np.nan in them break <, > and == opperators
+        # This specific handling had to be done (Can be improved)   
+        elif weights.dtype == object:


Note that this loop can still give unexpected errors, because you are here counting on object arrays to be turned into their values as scalars. E.g.,

np.isnan(np.array([1.,None,np.inf])[1]) # TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

This will be an uninformative error!

I think we have two choices: just not check for object dtype, or convert to float before checking (and then passing on it that conversion fails).

mhvk · 2025-03-31T19:20:13Z

numpy/lib/_function_base_impl.py

+
        if np.any(weights < 0):
            raise ValueError("Weights must be non-negative.")
+        elif np.all(weights == 0):


Here again we could ensure the common case remains fast by doing:

if np.any(weights <= 0): raise ValueError("weights must be non-negative and cannot be all zero.") # or, more explicit error messages, if np.all(weights == 0): raise ValueError("At least one weight must be non-zero.") else: raise ValueError("Weights must be non-negative.")

Trying to keep this inline:

The issue with this is that some of the weights might be 0, but none of them are negative. So it would raise an error even though it shouldn't

You're right, I was too sloppy in writing this, the else should be elif np.any(weights <0) so that the case of some weights 0 falls through (slowly, but better than making all cases slow!).

p.s. Given this, I'd probably swap the order, i.e.,

if np.any(weights <= 0): # Do these checks guarded by the above `if` to avoid slowing down the common case. if np.any(weights < 0): raise ValueError("Weights must be non-negative.") elif np.all(weights == 0): raise ValueError("At least one weight must be non-zero.")

@Tontonio3 I don't see how you responded to this suggestion. Please make sure all reviewer feedback is addressed before requesting re-review.

@ngoldbaum Your're right, I forgot to implement this

Tontonio3 · 2025-04-02T19:44:31Z

A general comment: I think these checks should happen inside _weights_ar_valid - this will ensure they are used for percentile as well.

The issue is that _weights_ar_valid is used in np.average which does accept negative weights

Tontonio3 · 2025-04-02T19:49:17Z

if np.any(weights <= 0):
raise ValueError("weights must be non-negative and cannot be all zero.")
# or, more explicit error messages,
if np.all(weights == 0):
raise ValueError("At least one weight must be non-zero.")
else:
raise ValueError("Weights must be non-negative.")

The issue with this is that some of the weights might be 0, but none of them are negative. So it would raise an error even though it shouldn't

Tontonio3 · 2025-04-02T19:58:45Z

Another general comment: as written, the common case has to go through a lot of checks.

I'd love to do that, the issue is that everything breaks with dtype=object arrays. Which is extremely frustrating, it'd be easier to just not allow dtype=object arrays

mhvk · 2025-04-02T21:15:35Z

A general comment: I think these checks should happen inside _weights_ar_valid - this will ensure they are used for percentile as well.

The issue is that _weights_ar_valid is used in np.average which does accept negative weights

Duh, I thought I had checked that, but I now see I was wrong. Sorry about that!

mhvk · 2025-04-02T21:17:23Z

Another general comment: as written, the common case has to go through a lot of checks.

I'd love to do that, the issue is that everything breaks with dtype=object arrays. Which is extremely frustrating, it'd be easier to just not allow dtype=object arrays

Yes, indeed, this is partially why I suggested to do the check much further down, when the object arrays are turned into float.

…cases

Tontonio3 · 2025-04-02T21:32:02Z

@mhvk I've implemented your suggestions. Although now if you have a np.nan withing a object array it will give this:
RuntimeWarning: invalid value encountered in greater

The Improved testing commit is to save the old error handing.

Tontonio3 · 2025-04-16T21:16:08Z

@seberg @ngoldbaum is there anything else that needs to be done to close this issue?

ngoldbaum · 2025-04-25T16:32:25Z

The linter still needs to be fixed.

ngoldbaum

Overall looks good to me. Some minor comments inline.

ngoldbaum · 2025-04-28T18:01:13Z

numpy/lib/_function_base_impl.py

+        if np.any(weights <= 0):
+            if np.any(weights < 0):
+                raise ValueError("Weights must be non-negative.")
+            elif np.all(weights == 0):


can't you just say else here?

I can't just say else here, as the first if will be true if only one of the weights is 0, which is valid

you can move the == 0 to the sum to simplify this, though.

IMO the checks may be clearer as not all(weights >= 0) to reject NaNs right away, but doesn't matter much so long the == 0 case is handled below.

Also, why not use isfinite?

If the weights are non-zero, but the array that it is given in all zeroes, the sum will be zero. I know it is a specific edge case, but it would give an error when it shouldn't give one.

I don't understand what the array matters, the below calculation is on weights?

The point is that after the cumsum, we need to check the cumsum result is finite and not zero. In fact you could even do that after normalization (at that point it is always NaN, although a warning would have occurred on the way if the input was inf or zero and not NaN).

Here, the check is incorrect. It checks that all weights are zero, but the thing we need to check is that if any slice is empty. Only the negative makes sense here.

ngoldbaum · 2025-04-28T18:03:00Z

numpy/lib/tests/test_function_base.py

+        q = 0.5
+        arr = [1, 2, 3, 4]
+        wgts = np.array(weight, dtype=dty)
+        with pytest.raises(err):


might as well use match here too

charris · 2025-05-19T20:48:47Z

@ngoldbaum Still look good to you?

jorenham · 2025-11-18T23:49:25Z

This could use a rebase

seberg

Hmmm, I guess we forgot it, but it still needs a tweak at least. The check for all == 0 is still in the wrong place, it should be with the other checks.

seberg · 2025-11-19T08:28:02Z

numpy/lib/_function_base_impl.py

+        if np.any(weights <= 0):
+            if np.any(weights < 0):
+                raise ValueError("Weights must be non-negative.")
+            elif np.all(weights == 0):


I don't understand what the array matters, the below calculation is on weights?

The point is that after the cumsum, we need to check the cumsum result is finite and not zero. In fact you could even do that after normalization (at that point it is always NaN, although a warning would have occurred on the way if the input was inf or zero and not NaN).

Here, the check is incorrect. It checks that all weights are zero, but the thing we need to check is that if any slice is empty. Only the negative makes sense here.

charris · 2025-11-20T20:54:57Z

Needs rebase.

seberg · 2025-11-28T15:17:18Z

OK, I fixed this up, I didn't bother about having a nice error. Frankly, I am not even sure we want an error (as opposed to a NaN return but I assume this was discussed or at least the result was wrong right now).
The main thing is that the all zeros check should have failed the test that now has one slice all zeros only.

I don't think Nathan can have a look quickly, so not sure who might want to review my changes.

The history is a mess, please just squash merge.

charris · 2025-11-28T17:05:17Z

doc/release/upcoming_changes/28595.improvement.rst

+
+* All weights are zero
+* At least one weight is `np.nan`
+* At least one weight is `np.inf`


File name is incorrect, should be 28595.improvement.rst.

mhvk · 2025-11-28T17:56:22Z

@seberg - I'll try to have a last review too, probably over the weekend or early next week...

charris · 2025-11-30T15:57:25Z

I'm just going to put this in, it is a corner case and if it needs improvement we can do that later.

charris · 2025-11-30T15:58:16Z

Thanks @Tontonio3, @seberg, @mhvk .

* added err messages and tests * Modified tests and added release note * Fixed tests * Fixed bug to handle object dtypes * Fixed bug to handle object dtypes * Streamlined testing, improved error handling capabilities * Changed infinite error message * Bug fix * Fixed lint test * Improved testing * Changed error handling, made it faster, removed dtype=object special cases * More comprehensive testing * More comprehensive testing * Fixed lint * Fixed tests * Fixed CircleCI test * streamlined checks * lint fix * lint fix * Fix and simplify things (but don't bother with being overly specific) * Appease linter gods and also use one valid entry for other test * rename release note fragment --------- Co-authored-by: Sebastian Berg <sebastianb@nvidia.com>

Tontonio3 added 5 commits March 27, 2025 13:18

added err messages and tests

293ea24

Modified tests and added release note

ae8cd54

Fixed tests

565fc75

Fixed bug to handle object dtypes

8309d16

Fixed bug to handle object dtypes

04d05f6

github-actions bot added the 00 - Bug label Mar 27, 2025

tylerjereddy added the component: numpy.lib label Mar 28, 2025

tylerjereddy reviewed Mar 28, 2025

View reviewed changes

ngoldbaum mentioned this pull request Mar 28, 2025

BUG: quantile should error when weights are all zeros #28589

Closed

Tontonio3 added 4 commits March 28, 2025 15:30

Streamlined testing, improved error handling capabilities

c76b5ad

Changed infinite error message

bfcec09

Bug fix

f06a1f8

Fixed lint test

1303b3c

melissawm added this to NumPy first-time contributor PRs Mar 31, 2025

melissawm moved this to Awaiting a code review in NumPy first-time contributor PRs Mar 31, 2025

mhvk reviewed Mar 31, 2025

View reviewed changes

Tontonio3 added 2 commits April 2, 2025 17:25

Improved testing

ad95df2

Changed error handling, made it faster, removed dtype=object special …

dc14e6b

…cases

Tontonio3 added 4 commits April 2, 2025 17:51

More comprehensive testing

8eeed6a

More comprehensive testing

4fe3444

Fixed lint

ba2c398

Fixed tests

2e8e2ea

Fixed CircleCI test

3a20796

streamlined checks

40075b3

ngoldbaum added this to the 2.3.0 release milestone Apr 25, 2025

Tontonio3 added 2 commits April 28, 2025 13:22

lint fix

a83887b

lint fix

3d1c7b0

ngoldbaum approved these changes Apr 28, 2025

View reviewed changes

charris modified the milestones: 2.3.0 release, 2.4.0 release May 21, 2025

seberg reviewed Nov 19, 2025

View reviewed changes

seberg added 3 commits November 28, 2025 15:57

Fix and simplify things (but don't bother with being overly specific)

649cd66

Merge branch 'main' into quantile-warn

deaa4ee

Appease linter gods and also use one valid entry for other test

195c1be

charris reviewed Nov 28, 2025

View reviewed changes

rename release note fragment

ce9d06b

charris merged commit afd1cce into numpy:main Nov 30, 2025
74 checks passed

github-project-automation bot moved this from Awaiting a code review to Completed in NumPy first-time contributor PRs Nov 30, 2025

seberg mentioned this pull request Dec 19, 2025

fix quantile all zeros error #28597

Closed

	# We use the weights to calculate the empirical cumulative
	# distribution function cdf
	cdf = weights.cumsum(axis=0, dtype=np.float64)
	cdf /= cdf[-1, ...] # normalization to 1

Uh oh!

BUG: quantile should error when weights are all zeros #28595

BUG: quantile should error when weights are all zeros #28595

Uh oh!

Conversation

Tontonio3 commented Mar 27, 2025

Uh oh!

tylerjereddy left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tontonio3 commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tontonio3 commented Mar 31, 2025

Uh oh!

mhvk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tontonio3 commented Apr 2, 2025

Uh oh!

Tontonio3 commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tontonio3 commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mhvk commented Apr 2, 2025

Uh oh!

mhvk commented Apr 2, 2025

Uh oh!

Tontonio3 commented Apr 2, 2025

Uh oh!

Tontonio3 commented Apr 16, 2025

Uh oh!

ngoldbaum commented Apr 25, 2025

Uh oh!

ngoldbaum left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tylerjereddy left a comment •

edited

Loading

Tontonio3 commented Mar 28, 2025 •

edited

Loading

Tontonio3 commented Apr 2, 2025 •

edited

Loading

Tontonio3 commented Apr 2, 2025 •

edited

Loading