-
-
Notifications
You must be signed in to change notification settings - Fork 12k
Description
Describe the issue:
EDIT: See below comment on np.ma.ptp()
np.ptp() changed its behavior (compared to the old ndarray.ptp()) when used on masked arrays with axis parameter if the expected result is a partially masked output array. It ignores the mask in the computation and also returns a masked array but without a proper mask.
Note that without the reshape and axis parameter in the below example it gets even a bit weirder. Old behavior was to return a scalar int64, new behavior is to return a masked array with a single item and no mask.
I believe this is a bug, since neither the maintenance_2.x changelog nor the issue tracker show anything about behavior changes for ptp.
Stumbled across this in our test suite when getting our downstream package obspy ready for numpy 2.0
Reproduce the code example:
import numpy as np
# example with reshape and axis
x = np.arange(12).reshape((3, 4))
x = np.ma.masked_less(x, 6)
print(x)
ptp = np.ptp(x, axis=1)
print(ptp)
print(ptp.mask)
# example with a 1d array and no axis parameter
x = np.arange(12)
x = np.ma.masked_less(x, 6)
print(x)
ptp = np.ptp(x)
print(ptp)
print(type(ptp))
print(ptp.mask)Error message:
## Old output numpy 1.23.2
#### example with reshape and axis
[[-- -- -- --]
[-- -- 6 7]
[8 9 10 11]]
[-- 1 3]
[ True False False]
#### example with 1d array
[-- -- -- -- -- -- 6 7 8 9 10 11]
5
<class 'numpy.int64'>
Traceback (most recent call last):
File "/tmp/npptp.py", line 9, in <module>
print(ptp.mask)
AttributeError: 'numpy.int64' object has no attribute 'mask'
## New output numpy 2.0.0rc2
#### example with reshape and axis
[[-- -- -- --]
[-- -- 6 7]
[8 9 10 11]]
[3 3 3]
False
#### example with 1d array
[-- -- -- -- -- -- 6 7 8 9 10 11]
11
<class 'numpy.ma.MaskedArray'>
False
Python and NumPy Versions:
Above output compares 1.23.2 and 2.0.0rc2
Runtime Environment:
[{'numpy_version': '2.0.0rc2',
'python': '3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) '
'[GCC 12.3.0]',
'uname': uname_result(system='Linux', node='mother', release='5.10.0-25-amd64', version='#1 SMP Debian 5.10.191-1 (2023-08-16)', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL'],
'not_found': ['AVX512_KNL', 'AVX512_KNM', 'AVX512_SPR']}},
{'architecture': 'SkylakeX',
'filepath': '/home/megies/miniconda3/envs/np2/lib/libopenblasp-r0.3.27.so',
'internal_api': 'openblas',
'num_threads': 8,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.27'}]
None
Context for the issue:
Well, it seems to make in ptp unusable on masked arrays and it would need major contortions to restore the old behavior (like filling masked values with the minimum along that axis and then also copying the mask over to the result), which doesn't seem to be what numpy wants. Masked arrays "just work" and this issue would mean a breach of that rule.
EDIT: When migrating code that at runtime encounters a mix of regular and masked arrays, just applying the proposed change from the migration guide ("ptp -- Use np.ptp(arr, ...) instead") will break users' codes if they don't realize that np.ma.ptp() is needed for masked arrays.