MAINT: stats.wilcoxon: fix failure with multidimensional x with NaN and slice length > 50#20592
Conversation
|
@tirthasheshpatel This is a pretty short fix; do you have a moment to take a look before we branch 1.13.1? |
tirthasheshpatel
left a comment
There was a problem hiding this comment.
LGTM! Thanks for the fix!
Just to confirm: Passing each slice individually would return z depending on which method it branches to but the behavior is unchanged for vectorized (multi-dimensional) inputs, right?
Er... no, or that's what happened in 1.13. This fixes that. import numpy as np
import scipy
from scipy import stats
x = np.arange(1, 100)
scipy.__version__ # 1.12.0
# branches to `approx`, but does not return `zstatistic`
hasattr(stats.wilcoxon(x, method='auto'), 'zstatistic') # FalseIn SciPy 1.13.0/main: import numpy as np
import scipy
from scipy import stats
x = np.arange(1, 100)
scipy.__version__ # 1.13.0
# branches to `approx` and returns `zstatistic` - this is the problem
hasattr(stats.wilcoxon(x, method='auto'), 'zstatistic') # **True**Because rng = np.random.default_rng(25893459825282452)
x = rng.random((2, 55))
x[1, :10] = np.nan
# Slice is length 55 > 50, so method='approx' is used, and `zstatistic` is added
hasattr(stats.wilcoxon(x[0], method='auto', nan_policy='omit'), 'zstatistic') # True
# Slice after removing NaNs is length 45 < 50, so method='exact' is used, and `zstatistic` is not added
hasattr(stats.wilcoxon(x[1], method='auto', nan_policy='omit'), 'zstatistic') # False
# Error because one slice has three return values and the other has only two
stats.wilcoxon(x, axis=1, method='auto', nan_policy='omit') # errorBy ensuring that |
|
Oh, OK. That makes sense. Thanks for the clarification! Tests look good too, so merging! Thanks! |
Reference issue
Closes gh-20591
What does this implement/fix?
This fixes a bug in which under specific conditions (
x.ndim > 1,x.shape[axis] > 50,np.isnan(x).any(),nan_policy='propagate', andmethod='auto', for instance)wilcoxoncould fail and raise an unexpected error.