Skip to content

BUG: RuntimeWarning emitted sporadically during __setitem__ on a masked array #23000

@jrbourbeau

Description

@jrbourbeau

Describe the issue:

Over in Dask's CI we are encountering a RuntimeWarning randomly in CI when running this test. Specifically in this line where we have a __setitem__ call on a masked array.

We elevate warnings from numpy to errors in the Dask test suite, so we get the following traceback (here's an example CI build where we see this):

_______________ test_setitem_extended_API_2d_mask[index1-value1] _______________
[gw0] darwin -- Python 3.11.0 /Users/runner/miniconda3/envs/test-environment/bin/python3.11

index = (slice(1, 5, 2), [7, 5])
value = masked_array(
  data=[[--, --],
        [--, --]],
  mask=[[ True,  True],
        [ True,  True]],
  fill_value=1e+20,
  dtype=float64)

    @pytest.mark.parametrize(
        "index, value",
        [
            [(1, slice(1, 7, 2)), np.ma.masked],
            [(slice(1, 5, 2), [7, 5]), np.ma.masked_all((2, 2))],
        ],
    )
    def test_setitem_extended_API_2d_mask(index, value):
        x = np.ma.arange(60).reshape((6, 10))
        dx = da.from_array(x.data, chunks=(2, 3))
        dx[index] = value
>       x[index] = value

dask/array/tests/test_array_core.py:4167: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = masked_array(
  data=[[                   0,                    1,
                            2,                    3...    56,                   57,
                           58,                   59]],
  mask=False,
  fill_value=999999)
indx = (slice(1, 5, 2), [7, 5])
value = masked_array(
  data=[[--, --],
        [--, --]],
  mask=[[ True,  True],
        [ True,  True]],
  fill_value=1e+20,
  dtype=float64)

    def __setitem__(self, indx, value):
        """
        x.__setitem__(i, y) <==> x[i]=y
    
        Set item described by index. If value is masked, masks those
        locations.
    
        """
        if self is masked:
            raise MaskError('Cannot alter the masked element.')
        _data = self._data
        _mask = self._mask
        if isinstance(indx, str):
            _data[indx] = value
            if _mask is nomask:
                self._mask = _mask = make_mask_none(self.shape, self.dtype)
            _mask[indx] = getmask(value)
            return
    
        _dtype = _data.dtype
    
        if value is masked:
            # The mask wasn't set: create a full version.
            if _mask is nomask:
                _mask = self._mask = make_mask_none(self.shape, _dtype)
            # Now, set the mask to its value.
            if _dtype.names is not None:
                _mask[indx] = tuple([True] * len(_dtype.names))
            else:
                _mask[indx] = True
            return
    
        # Get the _data part of the new value
        dval = getattr(value, '_data', value)
        # Get the _mask part of the new value
        mval = getmask(value)
        if _dtype.names is not None and mval is nomask:
            mval = tuple([False] * len(_dtype.names))
        if _mask is nomask:
            # Set the data, then the mask
>           _data[indx] = dval
E           RuntimeWarning: invalid value encountered in cast

../../../miniconda3/envs/test-environment/lib/python3.11/site-packages/numpy/ma/core.py:3371: RuntimeWarning

Unfortunately, we've not been able to reproduce the issue locally (have only seen it in CI). Though I thought it was still worth opening an issue in case NumPy devs have any insight into what might be happening.

I initially thought this might have been a mac-specific issue (possibly similar to conda-forge/numpy-feedstock#229) but e've also observed this failure on Windows (see this CI build).

Reproduce the code example:

# This is the relevant snippet extracted from the `dask/dask` test
import numpy as np

x = np.ma.arange(60).reshape((6, 10))
index = (slice(1, 5, 2), [7, 5])
value = np.ma.masked_all((2, 2))
x[index] = value

Error message:

No response

Runtime information:

In [1]: import sys, numpy; print(numpy.__version__); print(sys.version)
1.24.1
3.11.0 | packaged by conda-forge | (main, Oct 25 2022, 06:24:51) [Clang 14.0.4 ]

In [2]: import numpy

In [3]: print(numpy.show_runtime())
[{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
                      'found': ['SSSE3',
                                'SSE41',
                                'POPCNT',
                                'SSE42',
                                'AVX',
                                'F16C',
                                'FMA3',
                                'AVX2',
                                'AVX512F',
                                'AVX512CD',
                                'AVX512_SKX',
                                'AVX512_CLX',
                                'AVX512_CNL',
                                'AVX512_ICL'],
                      'not_found': ['AVX512_KNL']}},
 {'architecture': 'Haswell',
  'filepath': '/Users/james/mambaforge/envs/bad-env/lib/libopenblasp-r0.3.21.dylib',
  'internal_api': 'openblas',
  'num_threads': 8,
  'prefix': 'libopenblas',
  'threading_layer': 'openmp',
  'user_api': 'blas',
  'version': '0.3.21'},
 {'filepath': '/Users/james/mambaforge/envs/bad-env/lib/libomp.dylib',
  'internal_api': 'openmp',
  'num_threads': 8,
  'prefix': 'libomp',
  'user_api': 'openmp',
  'version': None}]
None

Note this is from my local machine where I'm not able to reproduce the issue.

From this dask/dask CI build, where the RuntimeWarning is emitted, I'm able to see numpy=1.24.1=py311h62c7003_0 from conda-forge is being used.

Context for the issue:

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions